This article addresses how engineers can better share context and implementation details in code changes by structuring their commit messages. Concise and informative commits can convey vital design decisions in the short term through peer reviews, are useful for release notes in the medium term and can save time when searching the history for key and breaking backwards compatibility changes in the long term.

What Problem Are We Trying To Solve?

Poor commit messages keep all of the context of code changes in the head of the one that wrote them.

Peer reviews on pull requests are more than a rubber stamp to satisfy criteria for passing branch protection rules; they’re more than “looks good to me”, they are a way to share important information on functionality changes, design decisions and limitations encountered.

We seek to share as much context of changes as possible, not just in the code review but in the history of the git log too. Often we use this long lived record to understand why and how changes were made and the impact they had - likely long after the person that wrote them has left the project.

Pull Requests

We create pull requests to merge branched changes into a codebase; it is an opportunity for feedback and knowledge sharing. Typically a merge request represents a complete change ie a new feature, bug fix etc. If later there is a problem with the addition, the whole change can be reverted, rather than sifting through multiple commits until the bad change has been identified and rolled back, potentially leaving half baked functionality. When merging it is typical to squash the many commits that make up a single pull request into one.

It is this squashed commit message that holds the most value. I would advise all commits along the way should be sensible but it is the merge commit that the team will benefit from in the long term; this is the one that should have structure and consistency.

Peer Reviews

How is an engineer to understand what they’re reviewing without appropriate context? The first thing they see when reviewing a pull request is the title and description but it is the collection of commit messages that form the final squashed message.

If an engineer seeks to understand a particular merge days or weeks later the majority of the context may have been lost, even by the person who made the changes.

Later down the line if an engineer is trying to find major or braking changes, this can be a difficult task.

Why not help ourselves by leaving more than just breadcrums?

We can write structured commits that clearly articulate the purpose and outcome of a change. They can be aggregated and form the basis of release notes for wider audiences.

Before looking at how to structure commits, we need to address how engineers in a team currently work.

Breaking Habits

Engineers come with a variety of standards and practises. Commit message are rarely given attention but it is when things go wrong that we really need them.

Things like fixed test and added API response field are not particularly helpful. Generic messages will get lost in the message swamp.

To break the habit we can leverage the tool and way of working Definition of Done; if you are unfamiliar read more on DoDs here. In short, a Definition is a shared agreed standard a team works to and can include criteria like information sharing and documentation standards.

First the team must understand and agree why rigorous documentation is important. Then you can introduce criteria that promotes successful outcomes.

Code cannot be merged unless a second member of the team is confident they understand the changes made and they are clearly documented.

This point hits both knowledge transfer and long term knowledge retention.

Assuming the team agree, it’s time to level up their commits.

Consistency

Having a consistent and agreed commit message structure can introduce an information hierarchy which guides the reader to the most important information. Categories can be added to help reviewers understand from the outset what the changes include - if we are adding new functionality or fixing old. It can also reduce the time and cognitive load to sift through the git history when searching for things like breaking changes.

Rather than reinvent the wheel we can adopt a simple framework called Conventional Commits.

Conventional Commits

A specification for adding human and machine readable meaning to commit messages

From the docs:

The Conventional Commits specification is a lightweight convention on top of commit messages. It provides an easy set of rules for creating an explicit commit history; which makes it easier to write automated tools on top of. This convention dovetails with SemVer, by describing the features, fixes, and breaking changes made in commit messages.

You can of course click the link and read the source for explanation without me rehashing everything so instead I’ll summarise it:

Message Template

<type>([optional scope]): <description>

[optional body]

[optional footer(s)]
  • type is a natural key to indicate the level of change:
    • feat, fix, build, chore, ci, docs, style, refactor, performance, test
  • optional scope ie css codeblock
  • description of the change ie make codeblock responsive to page width
  • body, the bulk of the context
  • footer, to add structured content which can be machine passed e.g:
    • BREAKING CHANGE: <description>
    • Co-authered-by: Jane <jane@example.com>

This example would read as:

style(css codeblock): make codeblock responsive to page width

Modify CSS to change from wrapping to scroll on screen size <= medium page width

BREAKING CHANGE: Removed codeblock scrolling from large screen format
Co-authered-by: Jane <jane@example.com>

This structure should make it easy to read and understand the change. It’s a stripped down story and before you read the code you should have a good idea of what is going on.

Gaining Context Quickly

Here’s an example with a little lorem ipsum to highlight how we can gain information by scanning quickly:

Let’s say the API we maintain no longer works on the default branch and we want to find out why.

We’ll pretend this message has meaningful content which explains all we need to know.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Arcu bibendum at various vel pharetra vel. Ut enim blandit volutpat maecenas volutpat. Semper auctor neque vitae tempus quam.

To understand and gain the relevant context we would likely need to read most if not all of the text. There is no precedence or information hierarchy. Let’s sprinkle some convention on it:

feat(ipsum): Lorem ipsum dolor sit amet

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Arcu bibendum at various vel pharetra vel. Ut enim blandit volutpat maecenas volutpat. Semper auctor neque vitae tempus quam.

BREAKING CHANGE: Semper auctor neque vitae tempus quam.

We are drawn to two keys bits of information and their relevant context:

  • a new feature has been added
  • a breaking change

Assuming the context is indeed concise and meaningful we may have gained all we need to understand why the API no longer works and what changes have been merged. We can read on for the full context or even look at the code diff but it is likely at this point we have what we need and have found where the breaking change originated.

Finding Specific Changes From The git log

It may be necessary to make changes to functionality such as the data structure in an API request, these can break functionality and remove backwards compatibility. Such changes may break other downstream consumers but remain undiscovered (or tested) until much later down the line. Should that happen we can use the git log to search for these things.

You can search the git log for key words, e.g BREAKING CHANGE.

git log --grep="BREAKING CHANGE"

--regexp-ignore-case is a useful flag to ignore case sensitivity.

Let’s say I want to find all references to css changes regardless of casing:

git log --grep="css" --regexp-ignore-case

Practical Application

Use the template in the pull requests description and copy this to the squashed merge commit message.

You will be helping those that come after you immesurably and should anyone git blame something you’ve written you can feel proud that to have left a clean trail clearly outlining why changes happened and any gotchas.

But what if people go against the agreed team ways of working?

Automation

Assuming the team uses the conventioal, checking and linting is an easy one to automate. CI services can check the format as part of merge requirements and fix any exceptions before asking for a review. Checking formatting standards shouldn’t be the responsibility of the reviewer, they should be automated for simplicity and consistency.

There are services like mergeability which lint your GitHub pull request descriptions. You can include rules and regex expressions to ensure they keep to the team standards.

If you wish to have at least mild enforcement on commit messages e.g detailing if it is a feature (feat), fix, chore you can use the pre-commit hook conventional-pre-commit.

What Have We Solved?

It is important not to be restrictive in our ways of working but promote positive results and remove difficulty from the team. Conventional commits adds to that principle and help engineers by:

  • Enhancing asynchronous information sharing and speeding up peer reviews
  • Increasing the speed for hunting down significant or breaking changes across the codebase and understanding why they happened.