Avoid merge commits

This topic can be a bit controversial, so I’ll say up front: what I’m about to outline is not the only way of doing things. There are some really effective Git workflows out there, such as gitflow, that rely on merge commits as containers of information. In some ways I see this type of design as a holdover from earlier days when people used SVN or other older/gnarlier VCS systems. On the other hand, there are sometimes good business-, DevOps-, and process-based arguments for having merge commits be part of a project’s history.

When to avoid merge commits

  • If you are a solo developer or even a duo or trio, it should not be even remotely necessary to use merge commits, and you should avoid them as much as you can for the reasons described below. Instead of merging, use git rebase to ensure your commits are fast-forward to your mainline branch before pushing.
  • If you are on a small-ish team (up to 15 active committers or so), you probably can do without merge commits, and the ideas here can probably benefit you.
  • If you are using a so-called trunk-based workflow, you will find that it’s also quite helpful to keep your merge commits to a minimum. In the trunk-based flow, each developer’s responsibility should be to make sure their commits are fast-forward to the mainline branch before they push to it.

Why not merge commits?

Short answer: merge commits are hard to reason about, hide information, and are usually (especially in a smaller team) not necessary.

A merge commit is a commit that has more than one parent ID. In other words, its immediate ancestor is not a single commit. This is kind of hard to wrap one’s head around. Think about it for a moment: before this commit there were two commits…each with their own commit message and purpose. This makes the commit inherently difficult to reason about, as well as difficult to represent and describe.

In turn, your understanding of your commit history and how it relates to that of another commit will not as easily extand back farther than the most recent merge commit. Another downside to this situation is that you can’t check out or reset to the commit directly preceding your merge commit, because there is not a single commit to check out.

Due to certain merging practices/mistakes/snafus, I’ve seen commit histories with multiple commits having the same message sprinkled throughout a branch. This can happen when the developer repeatedly merges branches together in an effort to keep their work abreast of the mainline branch. Honestly I’m not sure how it happens, but I’ve seen it in the wild on more than one occasion with multiple engineers. This is not only perplexing but the resultant history forms an inaccurate representation of the changes in the commits. Especially if you’re new to git or even if you simply don’t understand merging very well, stick with git rebase until you absolutely need to merge for some very specific reason.

Two classes of merge commits

Merge commits you encounter in the wild can be classified into two classes: accidental or ignorant merge commits, and conscientious merge commits.

Accidental or ignorant merge commits

One type of merge commit I call the “accidental” or “ignorant” merge commit. A user creates a merge commit because they don’t know there is any alternative. Usually you can tell these types of merge commits by the default commit message on them. For example, in a branch called main, you might see sporadic merge commits with messages like the following sprinkled among the normal commits:

Merge branch “main” into “branch-name-here”

Ever see these and wonder what they are? If you’re looking at the main branch and you’re seeing a message like Merge branch "main" into..." that’s awkward, right? Isn’t main the branch you would want to merge other branches into, instead? A good rule of thumb is that if you somehow create one of these in your branch, figure out why it’s there and get rid of it before you commit.

Conscientious merge comnits

Sometimes merge commits are used as a way to keep track of who merged what and when. The argument for this is that knowing when work from others was integrated into the history of a branch is important and valuable information. This could be appropriate if you are working in a large high-profile open-source project (like Linux, PHP, Go, Git, etc.), or a proprietary project that requires a very strict audit trail of how development was done for some legal reason.

The reason not to use merge commits in this way is that we already have that information in the commits themselves. The fact that changes came from some other branch at some particular time is superflouous, noisy, and can be confusing when reviewing commit history.

Rule of thumb: if you don’t have a very good reason to do this, don’t use merge commits in this way. If you want to groom your Git repo, you are much better off taking the following steps (which mostly keep merge commits out of your repo):