Use git to comment your code (and stop writing rubbish commit messages, please)

Posted on 29 Sep 2015 git

Over recent years we've seen the software community debate the usefulness of comments (this article being an example), and rightly so. The main argument is against explanatory comments, i.e. "this code is doing X", as the ideal situation is that the code is written in a way that means it's readable and self-explanatory. The problem with comments like this is that they easily become out of date, as someone makes a quick change to the code without reading and updating the associated comment. You then have the issue of a comment which directly contradicts the code it's meant to be explaining.

Another kind of comment that doesn't belong in the code is one like "I changed this because...", or what I'll call revision comments. I'd argue that these comments are just as prone to becoming out of date and contradictory as explanatory comments, and that they actually belong in the logs of your version control system. Git tracks the way your code changes over time and stores a human readable description of what changed at each commit. If your commit messages are written with this in mind then they become more like documentation for the history of your code.

The scourge of lazy commit messages

Have you ever thought about the purpose of your commit message? Do you write it thinking that nobody will ever read it again? Do any of these sound familiar?

updates
fixes
added feature x

I've absolutely been guilty of this in the past. If you consistently write commit messages like the above then you can guarantee that no-one will ever read them, as they're practically useless. They'll eventually be able to work out what changed by looking through the diffs of your commits, but they won't necessarily be able to work out why you made those changes. This valuable piece of information exists only in your head, and probably only for a few months at most.

On a more basic level, if you want to reset your codebase to a particular commit, scanning through a series of commits that don't have descriptive names means that you have no choice but to check the diffs. Don't do that to someone - it's mean.

Have you ever been in the situation where someone asks you why you made a particular change, only for you to come up with a total blank? Or, if you reverse the situation, have you ever looked at someone else's code and needed to know why it's evolved the way it has? I know there have been times where I've made some code "simpler", only to find that there was a very specific reason as to why it was written that way, and I've broken it.

The ideal situation is that our commit messages are targeted, clear and relevant, first describing the change and then why it has been made. You can then call on the git logs to describe the changes to a repository, a single file or even a given line in a file. Using these logs can help future developers (and future you) to know if they're about to make a big mistake in changing something that you changed for a very good reason 17 months ago.

Make regular, smaller commits

Although I've come a long way in writing descriptive commit messages, I still sometimes forget to commit regularly when I'm in the flow of things. This means that I end up committing a huge chunk of code at a time, with multiple unrelated changes. When this happens, it's pretty much impossible to write a useful commit message.

If you can't describe the changes in a couple of setences, it's best to break it down in to multiple commits. And if you make multiple changes that are totally independent of each other then they should go in separate commits. This is useful not just for providing clear commit messages, but also if you need to revert the changes introduced by a single commit. I've been in the situation where I've had to undo one half of a large commit that someone made a while back, and I can promise you that it's not fun.

A case study for "commenting" your code with git

Let's say I have a web app that involves a very common task: validating user-submitted passwords. Here's the very simple class that does the job:

class PasswordValidator
  def valid?(password)
    password =~ /^\w{6,}$/
  end
end

Here's the commit message:

commit 9eab442bf5d7f9dcd285412b8281e1bed0ca7cfa
Author: Jon Cairns <jon@joncairns.com>
Date:   Tue Sep 29 15:04:01 2015 +0100

    Add password validator

    Valid passwords are at least 6 characters long and contain only regex
    word characters.

Even if you aren't familiar with ruby, the commit message explains what the class does at this stage. There's no need to add a comment to the code, as it's very simple. But even if you did want some detail, the git log will always be there, unlike a comment which can easily be deleted.

NB: I like to write git commit messages in the present tense, imperative style, as recommended by git itself. This is because each commit is a description of how it changes the codebase. So I use "change" instead of "changes" or "changed", and "fix" instead of "fixes" or "fixed".

All code is susceptible to change. And in this case, after some testing, we've realised that we've got a potential bug in the code: there's no maximum limit on the length of the password, but our database column only allows 64 character strings. To avoid truncation, we update the regular expression:

class PasswordValidator
  def valid?(password)
    password =~ /^\w{6,64}$/
  end
end

tree 882a887ed4b5259ef6e6921119e0aef6a9b04c25
parent 9eab442bf5d7f9dcd285412b8281e1bed0ca7cfa
author Jon Cairns <jon@joncairns.com> Tue Sep 29 15:30:14 2015 +0100
committer Jon Cairns <jon@joncairns.com> Tue Sep 29 15:30:14 2015 +0100

Restrict valid passwords to be 64 characters long

Since the database field has a 64 character limit, passwords should
only be declared valid by the PasswordValidator if they're 64 characters
or fewer.

The commit says not only what changed but, crucially, why it was changed. The code is kept clean, without being littered with comments, but the history of the code is always available on demand.

If we carry on in this vein, this class will have a history of detailed and specific commit messages, as opposed to a series of "Updated password validator" messages.

How to see the code history

There are a number of commands that will help you view commits over time and, combined with the long list of possible arguments, practically endless ways of viewing the information. Here are two that I find particularly useful.

git log

Run without any arguments, git log will show you all commits in your current branch, in descending date order. You can make this more targeted by showing only the commits that affect a single file, with git log -- <path/to/file>, and you can even show the full diffs alongside with the -p argument:

$ git log -p -- password_validator.rb
commit daca6dae0ca00ef954a2e4bc85b57a3c63bd3e1e
Author: Jon Cairns <jon@joncairns.com>
Date:   Tue Sep 29 15:30:14 2015 +0100

    Restrict valid passwords to be 64 characters long

    Since the database field has a 64 character limit, passwords should
    only be declared valid by the PasswordValidator if they're 64 characters
    or fewer.

diff --git a/password_validator.rb b/password_validator.rb
index 6735a39..06d4a2a 100644
--- a/password_validator.rb
+++ b/password_validator.rb
@@ -1,9 +1,5 @@
 class PasswordValidator
-  def initialize(password)
-    @password = password
-  end
-
-  def valid?
-    !!(password =~ /^\w{6,}$/)
+  def valid?(password)
+    password =~ /^\w{6,64}$/
   end
 end

commit 9eab442bf5d7f9dcd285412b8281e1bed0ca7cfa
Author: Jon Cairns <jon@joncairns.com>
Date:   Tue Sep 29 15:04:01 2015 +0100

    Add password validator

    Valid passwords are at least 6 characters long and contain only word
    characters.

diff --git a/password_validator.rb b/password_validator.rb
new file mode 100644
index 0000000..6735a39
--- /dev/null
+++ b/password_validator.rb
@@ -0,0 +1,9 @@
+class PasswordValidator
+  def initialize(password)
+    @password = password
+  end
+
+  def valid?
+    !!(password =~ /^\w{6,}$/)
+  end
+end

You can also view the commit log for a specific line (or range) with -L, and using the format <line>:<file>:

$ git log -p -L 3:password_validator.rb
...

git blame

The name of this command suggests a certain level of aggression, but I find it helpful to get an overview of how a file has been affected by commits over time. The output gives each line of a file prepended with the details of the most recent commit that affected that line, including the author of that commit:

$ git blame password_validator.rb
^9eab442 (Jon Cairns 2015-09-29 15:04:01 +0100 1) class PasswordValidator
^9eab442 (Jon Cairns 2015-09-29 15:04:01 +0100 1)   def valid?(password)
daca6dae (Jon Cairns 2015-09-29 15:30:14 +0100 3)     password =~ /^\w{6,64}$/
^9eab442 (Jon Cairns 2015-09-29 15:04:01 +0100 4)   end
^9eab442 (Jon Cairns 2015-09-29 15:04:01 +0100 5) end

You can view the full commit with git show <commit-sha>, to see the commit message and full diff.

Conclusion

Better commit messages can save you and people working on the same project from potential future headaches, and will help you to learn why your code has evolved in the way it has. This will add a certain level of protection against bugs, and works as a kind of documentation. Get to know git log and git blame, and use them to understand the code your about to change.

Share