Sunday, July 31, 2011

Feature Branches are Not Evil. In fact, they Rock.

There have been several notable blogs lately decrying the faults of feature branches. Both Martin Fowler and Jez Humble have blogged on this topic. Their position contravenes both the proven success of massive projects like the linux kernel who have made forking a way of life and the tenents of lean thinking which tell us to do value added tasks (here, merging) just in time. Their argument is that feature branches cause you to accumulate code destined for release in a multiple places and this fragmentation impedes the benefits of continuous integration, makes refactoring harder, and generally inhibits communication via code. These code quality assurance practices should be pushed earlier upstream, but premature integration is not the best way to do this.

Both authors appear to assume an enterprise development setting where we assume all contributors are committers and they don't provide any explanations for why their advice rejects the standard practice of major open source communities who wholesale ignore it. The rise of distributed SCMs such as git happened specifically to allow the fork-modify-merge development pattern, and the pull request has become an accepted best practice. Git was specifically designed to support the linux kernel's development process which uses forking on a massive scale.

I will argue that there are good reasons to use feature branches in both enterprise and open source settings and that there are good mitigations to the problems they discuss. I see having feature branches as a form of inventory, so that they should be minimized in both their duration and their quantity, but like inventory, they occur in the real world to mitigate risk and variation.

I recommend a model where we create feature branches based on "minimal marketable features" (MMFs). By definition, such a feature is atomic in that no proper subset of it has value to users. In a corporate setting, a team should follow the advice to "stop starting and start finishing" by allocating available developers first to in progress MMFs, until they are "full". Only when we can't do this should we pull a new feature from the queue. By following these two practices, we guarantee that feature branches that support MMFs are of minimal duration and multiplicity. I don't claim this alone addresses the issues raised by Fowler and Humble, but it will go a long way towards avoidance. You simply should not be working on epics, but instead break them down into MMFs, put them in your backlog, and work like hell to finish anything you start as quickly as possible -- not by "trying harder" but by managing the process to assure it's lean: do not start something new if you can instead make something in-flight finish sooner.

I'd also like to point out that, whether we work in an open source community or an enterprise setting, we really don't know that a feature will be successfully completed and incorporated into the product until it is. Software development involves risk. Half of software projects fail, and there is evidence to prove it. Sometimes we encounter technical difficulties. Sometimes "the business" changes its mind about what it wants. Sometimes there are re-organizations. Sometimes we were trying to innovate and fail.  Sometimes the lead developer gets sent to jail for murdering his wife or gets hit by a bus. In open source settings, the maintainers get final say on what goes in and they have to turn away contributions that don't pass muster. They try to accept contributions, but have to maintain their standards and design integrity.  If you are the contributor, there is no absolute guarantee your work will be merged until it is.

Short of a feature totally failing, delay is a much more common risk. If we have two feature branches in flight, we simply don't know which one will ship first. We can estimate, predict, etc..., but we can't become trapped by those predictions. When multiple features are in flight at once, we should remove all non-essential work from the value path and let whoever gets there first win. Then the other branches will have to deal with rebasing, but notice the work of merging only impacts one of the branches. I'll talk more later about how to reduce the pain of losing the race.

If we apply "lean thinking", we want to move quality functions earlier, and do value added activities at the last responsible moment. The problem with the positions taking by Humble and Fowler is that they are treating a value added activity as a quality improving activity, so they try to push it upstream, doing it early and often. That's why they focus on refactoring, communication, and merge collision detection instead of focussing on minimizing the lead time to the next feature and total cycle time. Letting one of the teams not have to do the merge before they ship helps both.

Suppose that we have feature A and B, but deep into development on B we realize that there is a much better solution for B and that what we really want is C, so we scrap B entirely. This happens all the time in the real world, and it highlights that premature merging is a form of code inventory and a form of waste. It's bad enough that both teams had to deal with merging code from A and B, when the minimum waste solution would be to to impact the cycle time of only one of them. But here, since B is cancelled, the merge ended up being non-value added work, and even worse it was actively harmful, because to ship A we have to back out code for B. If you are a developer on A, you are pretty pissed off by this.

When you work on MMFs, which are worthless until done, the notion that feature branches accumulate code "destined for release" is an optimistic but unproven assertion. We obviously wish it was certain, but we also need to realize that shipping code that doesn't actually function is waste, especially if the quantity of such code grows over time. Dead code bloats and confuses our codebase, and is technical debt. I know that "feature flags" are all the rage now, but we need to be clear about what their benefits are. It's not so that we can release partial, non-working code, and hide it. This is a form of resource leak if we don't clean them up, especially when features are cancelled or delayed. I don't want 3000 feature flags piling up over the years. Feature flags mitigate risk associated with change by simplifying back outs when something goes wrong. More importantly, feature flags allow us to do limited real-world testing in the production environment instead of some expensive staging environment. Eliminating whole non-production environments is a whole lotta waste saved. They also give customers control by enabling pull, so it's their choice as to when features are activated. Splitting delivery and activation is really the key benefit. We should only deliver working features, a point Humble and Fowler agree with, but recall that if we work on MMFs there are no partial working features.

OK, so what about the difficulties with continuous integration, refactoring, and communication that Humble and Fowler are worried about? Suppose we end up with feature branch A and B in flight at the same time, how can we make it all work. I agree with Jilles van Gurp, who blogged about git best practices when we are forced to deal with this. I see there being two main issues: communication and early conflict detection. Refactoring comes in two forms: either I'm refactoring to support my feature, or I'm solving technical debt for it's own sake. The latter, I would argue, should be treated like any other feature, and in particular, refactoring as technical debt cleanup should also follow the MMF model, so that I'm not doing epic refactoring (pun intended). If you follow this, I don't think whether a feature involves some refactoring or not really matters in the final analysis. Feature A and B each introduce some changes, which might conflict. The only question is how you deal with it when they do.

I think the answer is relatively simple. I should create a special branch that automatically merges any live feature branches. Merge failures are reported back, and the teams should talk when they happen. Jenkins automates this, as described here. This integration branch is always expected to contain unstable development code. The key difference here is that I am trying to merge every time code is pushed to any feature branch. If one of my features has to be cancelled, I reset the integration branch based on the new set of in-flight feature branches (dropping the cancelled one). The key difference between this solution and what Humble and Fowler advocate is that I never expect this merge branch to be releaseable, but they do. When a merge conflict is detected this way, the teams have to talk about what to do. They have options.

So in summary, I think MMF style feature branches that "ship when done" work well and don't suffer the problems Fowler and Humble worry about if we use the right tooling to automerge and communicate.

1 comment:

  1. I agree 100% with this and is what I have been trying to do. The only problem I have is Jenkins only merges one branch at a time (the last one that changed) no matter what I do. How do you manage this? Or do you push the merged branch back and manually delete it when there is a feature removed?

    ReplyDelete