How One Eng Team Works
As Coinbase Engineering has scaled, we’ve grown into several high functioning and collaborative teams. Scaling our teams while ensuring operational excellence has required technical strength and lightweight processes that actively improve over time. There’s no shortage of philosophies or manifestos on how to operate a team but rarely do we really understand how sausage is made without making it ourselves. This post provides a snapshot into how one of our engineering teams works today.
Over the last 12 months our Infrastructure team has more than tripled in size, accelerated developer productivity through 19,119 deployments, codified our infrastructure and has begun to open source our stack while operating one of the most valuable clouds in tech. As an exercise to improve operational excellence (and security) we recently tested our automation by rebuilding our entire infrastructure from scratch in a 24 hour period without downtime. This is how we work.
How We Work
We believe engaged engineering teams are fueled by transparency. This transparency begins with how we plan to work together. Transparency is hard to maintain at scale. It’s even harder to distribute verbal feedback at scale, so wherever possible we write things down. Our team operations start with a written one-pager titled “How We Work” that the team evolves through sprint retrospectives and ad-hoc proposals that the team is encouraged to share at anytime. You can find a generalized version of our one-pager here.
When choosing the big projects we’re to tackle each quarter, we optimize for a combination of vision, creativity and buy-in from our team and that starts with our quarterly plan. Before a quarter begins, the team brainstorms creative, high leverage projects for the upcoming quarter. This is captured in a transparent backlog that is always open to new ideas from anyone in the company. The manager ensures other team and company priorities are properly communicated to both the team and in this backlog, which we turn into a plan at our quarterly offsite. You can find a sample project backlog and planning template similar to what we use here.
At our quarterly offsite we take an objective look at how we’re doing, what we’re trying to accomplish, and leave with a draft of our next quarter’s roadmap. Some of the topics we’ve found most valuable to cover at our offsite include:
- A data driven review of the past quarter, reviewing trends in our team and company KPIs. We ask if we’re collecting the right metrics and ensure there’s a willing owner for each.
- Hard to collect or expensive metrics like engineering satisfaction or NPS are collected through surveys that are reviewed here. This helps integrate feedback from our customers directly into our plan.
- Reviewing our company and team mission statements, revising where appropriate. After first getting our team mission right, we have typically only changed a few words each quarter to better reflect any changes in scope.
- Asking ourselves both what makes our team different and where we’re the same as our peers. This helps us invent only what must be invented while collaborating with larger teams through open source (or purchased) services.
With that context fresh on our minds, we use the last half of the offsite to vote on priorities and collaboratively estimate scope. Ensuring that the whole team has a strong voice in the roadmap is critical for both engagement and creative solutions. Sourcing from a diverse team of practitioners leads to a better outcome than top-down direction, and this transparent, inclusive process has worked well for us.
Leaving our offsite, we are careful to note that our plan isn’t yet final. Any disagreements are put on hold to collect more data (while maintaining a bias for action), and our plan still needs to be lined up with plans from other teams. We’ve rarely had to change more than scheduling from this point on and are typically rallied on our next big milestones from here on out.
We rely on our roadmap to coordinate big projects across the team, to plan milestones and to publicly share our priorities. It also indicates what other work isn’t getting done. Throughout a quarter, our roadmap is a valuable guide but never law. We’re always ingesting new information in our fast paced industry, company and team. As projects and information evolves we are cautious to avoid getting stuck with sunk costs by regularly asking ourselves if we’re working on the right things for the right reasons. We update our roadmap regularly to help predict any schedule changes that might impact other teams or people.
Just as important to what we are planning on doing is what we’re not doing and our roadmap helps communicate this clearly to other teams. When your team doesn’t have the bandwidth for a project because of existing & higher priorities, your roadmap is the right place to communicate this. Rather than telling a customer “no”, we tell them “not right now, because we think these priorities on our roadmap are more important”. If the customer wants to challenge priorities, hear them out — they just might be right. We share our roadmap with the entire company and you can find see an example of our roadmap here.
Don’t overdo your roadmap. If you’re small, you probably don’t need to collaborate with other teams or have crisp priorities. A few bullet points or post-it notes might work just fine. Ours has grown from a whiteboard to a one-column spreadsheet to this.
Types of Projects
Not all projects can be treated the same and separating that upfront has helped us appreciate the breadth of our responsibilities. We break projects up into 3 different types that acknowledge we don’t just spend our time shipping new, bug-free systems. We’re influenced by The Phoenix Project here which has helped us maintain both high velocity and a high bar for quality. This is important for the team to take pride in their work. Here’s how we break down our projects:
- New Projects — The Phoenix Project separates internal projects from external projects but we group these together in our quarterly planning for simplicity. Before any project begins, the project lead shares a one-pager with a defined problem, goal and metrics. We avoid oversharing implementation details — we align on what we’re solving and avoid designing by committee.
- Operations Projects — Tech debt, vulnerabilities and legacy are born when you create something without a plan to operate it. We plan for operations upfront and wherever possible automate this work away. To minimize toil, we look for repetitive manual tasks and encourage the team to judiciously automate them.
- Unplanned Projects — Some events can’t be predicted, but we are sure that something will always disrupt even our best plans. Outages, support and security issues need prompt attention that can otherwise grind your team to a halt. We have 24x7 primary and secondary on-calls whose weeks are cleared to deal with unplanned projects. If it’s a quiet week, we again automate away toil. Wherever possible, we route unplanned requests through a Service Desk to maintain a paper trail between shifts. Urgent service interruptions will run through our incident response process and conclude in a post-mortem.
When decisions are made we want to see them influenced by cold, hard facts instead of hidden biases or the HiPPO (Highest Paid Person’s Opinion). We measure 20 KPIs across our team and share them publicly both on the web and on dashboards around the office. KPI owners often check these daily or more. The team then uses Sprint Reviews to ask if our KPIs are showing any new trends that require action and if we’re collecting the right KPIs. This helps the team broadly spot problems and appreciate if/when they need to influence priorities.
Good metrics can be expensive to collect and the expense is often worth it. Before we start a new project we think hard about how we can measure our goals. Without a clear metric it’s harder to build the right solution, communicate progress and rally towards a milestone.
On a recent large project to codify our infrastructure, an engineer decided that codified percentage would best reflect the team’s progress but had no easy way to measure this. Knowing the project would take a large part of the team’s quarter, we decided to first invest time in the metric before the solution. The engineer solved this by extending our open source GeoEngineer to measure codified percentage, launched a new internal service that regularly updated our codified % and presented through a new dashboard. This took time, but his dashboard became an invaluable metric that the team rallied behind and used to communicate progress on a complicated project to the rest of the company. This was an expensive metric and totally worth it.
When we think of an ideal workday we want to be engaged and productive. Our sprints are designed to be lightweight so we can focus on doing something that matters. In two week increments, we collaborate through the following syncs:
- Sprint Planning — Our sprints begin on Tuesdays with Sprint Planning. Before the meeting, we share a regular report of all open/stale pull requests that engineers might choose to prioritize in the upcoming sprint. Engineers prepare for Sprint Planning by triaging, prioritizing and estimating new tickets. As a team we review any updated company priorities, move a realistic amount of work (relative to what we’ve accomplished in prior sprints) into the sprint plan and kick things off.
- Sprint Retrospective — The final Thursday of each sprint includes a short retrospective where each member shares what went well, what didn’t and what they’ve learned this sprint. As a team we select one (and only one) thing to improve the following sprint which could be a change to How We Work. This focused evolution minimizes the need for excessive followup or planning. This is how we evolve alongside our growth and challenges.
- Sprint Review — We use the closing Monday of a sprint to review our KPIs and openly demo any newly shipped systems, features or successes. We find this to be a motivating checkpoint to share momentum and learn from our peers.
- Daily Standup — The shortest but perhaps the most important part of how we work. At 1pm daily the team convenes on an (optional) hangout, elects a standup leader who passes a (virtual) conch around the team and takes notes into slack in < 5 minutes. These standups are neither run by nor for management. This is for the team to openly share their progress, blockers and plans.
Coinbase is working towards a big mission to create an open financial system for the world. There are too many novel challenges now and in our future to accomplish this with top-down direction or a narrow set of perspectives. Instead, we rely on diverse teams from around the world and inclusive communication. Inclusive communication doesn’t always come naturally; we encourage, educate, and train our leaders to bring out the best in our teams. This means ensuring that all voices are heard and all perspectives are taken seriously.
This breakdown of how one of our engineering teams works today has come from many months of iteration. This has changed overtime and by design will continue to do so. This isn’t meant to be a recipe for how you should operate and we hope you find some relative value in this snapshot. We care about building the best teams and are hiring engineers, engineering managers and product managers to help us scale. If you’re interested in learning more and building the future of finance, please get in touch!
Thanks to Roman Shtylman, John Yi, Diogo Monica and Josh Ellithorpe for reviewing early drafts of this.