Commit Often, Perfect Later, Publish Once: Git Best Practices

Best Practices vary from environment to environment, and there is no One True Answer, but still, this represents a consensus from #git and in some cases helps you frame the discussion for the generation of your very own best practices.

Table of Contents

Do read about git

Knowing where to look is half the battle. I strongly urge everyone to read (and support) the Pro Git book. The other resources are highly recommended by various people as well.

Do commit early and often

Git only takes full responsibility for your data when you commit. If you fail to commit and then do something poorly thought out, you can run into trouble. Additionally, having periodic checkpoints means that you can understand how you broke something.

People resist this out of some sense that this is ugly, limits git-bisection functionality, is confusing to observers, and might lead to accusations of stupidity. Well, I'm here to tell you that resisting this is ignorant. Commit Early And Often. If, after you are done, you want to pretend to the outside world that your work sprung complete from your mind into the repository in utter perfection with each concept fully thought out and divided into individual concept-commits, well git supports that: see Sausage Making below. However, don't let tomorrow's beauty stop you from performing continuous commits today.

Personally, I commit early and often and then let the sausage making be seen by all except in the most formal of circumstances (public projects with large numbers of users, developers, or high developer turnover). For a less formal usage, like say this document I let people see what really happened.

Don't panic

As long as you have committed your work (or in many cases even added it with git add) your work will not be lost for at least two weeks unless you really work at it (run commands that manually purge it).

See on undoing, fixing, or removing commits in git if you want to fix a particular problematic commit or commits, as opposed to attempting to locate lost data.

When attempting to find your lost commits, first make sure you will not lose any current work. You should commit or stash your current work before performing any recovery efforts that might destroy your current work and perhaps take backups of it (see Backups below). After finding the commits you can reset, rebase, cherry-pick, merge, or otherwise do what is necessary to get the commit history and work tree you desire.

There are three places where "lost" changes can be hiding. They might be in the reflog (git log -g), they might be in lost&found (git fsck --unreachable), or they might have been stashed (git stash list).

Do backups

Everyone always recommends taking backups as best practice, and I am going to do the same. However, you already may have a highly redundant distributed ad-hoc backup system in place! This is because essentially every clone is a backup. In many cases, you may want to use a clone for git experiments to perfect your method before trying it for real (this is most useful for git filter-branch and similar commands where your goal is to permanently destroy history without recourse—if you mess it up you may not have recourse). Still, probably you need a more formal system as well.

Traditional backups are still appropriate, and clones do not save git configurations, the working directory and index, non-standard refs, or dangling objects anyway. A normal tarball, cp, rsync, zip, rar or similar backup copy will be a perfectly fine backup. As long as the underlying filesystem doesn't reorder git I/O dramatically and there is not a long time delay between the scanning of the directory and the retrieval of the files, the resulting copy of .git should be consistent under almost all circumstances including if taken while git operations are in progress, though see also discussions about custom backup techniques to ensure git consistency. Of course, if you have a backup from in the middle of a git operation, you might need to do some recovery. The data should all be present though. When performing git experiments involving items other than normally reachable commits, a copy instead of a clone may be more appropriate.

However, if you want a "pure git" solution that clones everything in a directory of repos, something like this may be what you need:

cd /src/backupgit
ls -F . | grep / > /tmp/.gitmissing1
ssh -n git.example.com ls -F /src/git/. | grep / > /tmp/.gitmissing2
diff /tmp/.gitmissing1 /tmp/.gitmissing2 | egrep '^>' |
  while read x f; do
    git clone --bare --mirror ssh://git.example.com/src/git/$$f $$f
  done
rm -f /tmp/.gitmissing1 /tmp/.gitmissing2
for f in */.; do (cd $$f; echo $$f; git fetch); done

Don't change published history

Once you git push (or in theory someone pulls from your repo, but people who pull from a working repo often deserve what they get) your changes to the authoritative upstream repository or otherwise make the commits or tags publicly visible, you should ideally consider those commits etched in diamond for all eternity. If you later find out that you messed up, make new commits that fix the problems (possibly by revert, possibly by patching, etc).

Yes, of course git allows you to rewrite public history, but it is problematic for everyone and thus it is just not best practice to do so.

I've said it and I believe it, but…on occasion…if well managed…there are times when changing published history is perhaps a normal course of business. You can plan for particular branches (integration branches especially) or (better) special alternate repositories to be continually rewritten as a matter of course. You see this in git.git with the "pu" branch, for example. Obviously this process must be well controlled and ideally done by one of the most experienced and well trusted engineers (because auditing merges (and worse, non-merge commits) you have essentially seen before is extremely tedious, boring, and error prone and you have lost the protection of git's cryptographic history validation).

Do choose a workflow

Some people have called git a tool to create a SCM workflow instead of an SCM tool. There is some truth to this. I am not going to specifically espouse one specific workflow as the best practice for using git since it depends heavily on the size and type of project and the skill of users, developers, and release engineers; however both reflexive avoidance of branches due to stupidity of other SCM systems and reflexive overuse of branches (since branches are actually easy with git) is most likely ignorance. Pick the style that best suits your project and don't complain about user's tactical uses of private branches.

I also implore managers who may be thinking of making specific workflow rules by fiat to remember that not all projects are identical, and rules that work on one project may not work on another. People who blather on about continuous integration, rolling deployment, and entirely independent feature changes that you can pick and choose between independently are absolutely correct, for their project! However, there are many projects and features which are much more complicated and may take longer than a particular sprint/unit-of-time and require multiple people to complete and have complex interdependencies with other features. It is not a sign of stupidity but rather of complexity and, just perhaps, brilliant developers, who can keep it all straight. It can also lead to a market advantage since you can deploy a differentiating feature which your competitors cannot in a short timeframe.

Branch workflows

Answering the following questions helps you choose a branch workflow:

See the following references for more information on branch workflows.

However, also understand that everyone already has an implicit private branch due to their cloned repository: they can do work locally, do a git pull --rebase when they are done, perform final testing, and then push their work out. If you run into a situation where you might need the benefits of a feature branch before you are done, you can even retroactively commit&branch then optionally reset your primary branch back to @{u}. Once you push you lose that ability.

Some people have been very successful with just master and $RELEASE branches ($RELEASE branch for QA and polishing, master for features, specific to each released version.) Other people have been very successful with many feature branches, integration branches, QA, and release branches. The faster the release cycle and the more experimental the changes, the more branches will be useful—continuous releases or large refactoring project seem to suggest larger numbers of branches (note the number of branches is the tail, not the dog: more branches will not make you release faster).

The importance of some of the questions I asked may not be immediately obvious. For example, how does having work which needs to be updated in multiple distinct long-lived branches affect branch workflow? Well, you may want to try to have a "core" branch which these other branches diverge from, and then have your feature/bugfix branches involving these multiple branches come off of the lowest-common-merge-base (LCMB) for these long-lived branches. This way, you make your change (potentially merge your feature branch back into the "core" branch), and then merge the "core" branch back into all of the other long-lived branches. This avoids the dreaded cherry-pick workflow.

Branch naming conventions are also often overlooked. You must have conventions for naming release branches, integration branches, QA branches, feature branches (if applicable), tactical branches, team branches, user branches, etc. Also, if you use share repositories with other projects/groups, you probably will need a way to disambiguate your branches from their branches. Don't be afraid of "/" in the branch name when appropriate (but do be afraid of using a remote's name as a directory component of a branch name, or correspondingly naming a remote after a branch name or directory component).

Distributed workflows

Answering the following questions helps you choose a distributed workflow:

See the following references for more information on distributed workflows.

Cathedrals (traditional corporate development models) often want to have (or to pretend to have) the one true centralized repository. Bazaars (linux, and the Github-promoted workflow) often want to have many repositories with some method to notify a higher authority that you have work to integrate (pull requests).

However, even if you go for, say, a traditional corporate centralized development model, don't forbid self-organized teams to create their own repositories for their own tactical reasons. Even having to fill out a justification form is probably too cumbersome.

Release tagging

Choosing your release workflow (how to get the code to the customer) is another important decision. You should have already considered most of the issues when going over the branching and distributed workflow above, but less obviously, it may affect how and when you perform tagging, and specifically the name of the tag you use.

At first glance, it is a no-brainer. When you release something you tag something, and of course I highly recommend this. However, tags should be treated as immutable once you push. Well, that only makes sense, you might think to yourself, but consider this: five minutes after everyone has signed off on the 2.0 release, it has been tagged Frobber_Release_2.0 and pushed, but before any customer has seen the resulting product someone comes running in "OMFG, the foobar is broken when you frobnoz the baz." What do you do? Do you skip release 2.0 and tag 2.0.1? Do you do a take-back and go to every repo of every developer and delete the 2.0 tag?

Two ideas for your consideration. Instead of a release tag, use a release branch with the marketing name (and then stop committing to that branch after release, disabling write access to it in gitolite or something). Another idea, use an internal tag name that is not directly derived from the version number that marketing wishes to declare to the outside world. The problem with the branch idea is that if you cannot (or forget to) disable write access then someone might accidentally commit to that branch, leading to confusion about what was actually released to the customer. The problem with the tag idea is that you need to remember what the final shipped tag name is, independent from the release name. However, if you use both techniques, they cover for each other's disadvantages. In any case, using either technique will be better than using marketing-version tags (as I know from experience).

Security model

You might ask why security is not a top level item and is near the end of the workflow section. Well that is because in an ideal world your security should support your workflow not be an impediment to it.

For instance, did you decide certain branches should only have certain people being allowed to access it? Did you decide that certain repositories should only have certain people able to access/write to them?

While git allows users to set up many different types of access control, access methods, and the like; the best for most deployments might be to set up a centralized git master repository with a gitolite manager to provide fine grained access control with ssh based authentication and encryption.

Of course, security is more than access control. It is also assurance that what you release is what was written by the people it should be written by, and what was tested. Git provides you this for free, but certain formal users may wish to use signed tags. Watch for signed pushes in a future version of git.

Do divide work into repositories

Repositories sometimes get used to store things that they should not, simply because they were there. Try to avoid doing so.

Do make useful commit messages

Creating insightful and descriptive commit messages is one of the best things you can do for others who use the repository. It lets people quickly understand changes without having to read code. When doing history archeology to answer some question, good commit messages likewise become very important.

The normal git rule of using the first line to provide a short (50-72 character) summary of the change is also very good. Looking at the output of gitk or git log --oneline might help you understand why.

Also see A Note About Git Commit Messages for even more good ideas.

While this relates to the later topic of integration with external tools, including bug/issue/request tracking numbers in your commit messages provides a great deal of associated information to people trying to understand what is going on. You should also enforce your standards on commit messages, when possible, through hooks. See Enforcing standards below.

On Sausage Making

Some people like to hide the sausage making ¹, or in other words pretend to the outside world that their commits sprung full-formed in utter perfection into their git repository. Certain large public projects demand this, others demand smushing all work into one large commit, and still others do not care.

A good reason to hide the sausage making is if you feel you may be cherry-picking commits a lot (though this too is often a sign of bad workflow). Having one or a small number of commits to pick is much easier than having to find one commit here, one there, and half of this other one. The latter approach makes your problem much much harder and typically will lead to merge conflicts when the donor branch is finally merged in.

Another good reason is to ensure each commit compiles and/or passes regression tests, and represents a different easily understood concepts. The former allows git-bisect to chose any commit and have a good chance of that commit doing something useful, and the latter allows for easy change/commit/code review, understanding, archeology, and cherry-picking. When reviewing commits, for example the reviewer might see something suspicious in a commit and then have to spend time tracking down their suspicions and write them up, only to discover five commits later that the original developer subsequently found and fixed the problem, wasting the reviewer's time (reviewing the entire patch series as a diff fixes this problem but greatly adds complexity as multiple concepts get muddled). By cleaning up patches into single, logical changes that build on one another, and which don't individually regress (i.e., they are always moving towards some desirable common endpoint), the author is writing a chronological story not of what happened, but what should happen, with the intent that the audience (i.e., reviewers) are convinced that the change is the right thing to do. Proponents claim it is all about leaving a history others can later use to understand why the code became the way it is now, to make it less likely for others to break it.

The downside to hiding the sausage making is the added time it takes to perfect the administrative parts of the developers job. It is time taken away from getting code working; time solely dedicated to either administrative beauty or enhancing the ability to performing the blame based (or ego-full) development methodology.

If you think about it, movies are made this way. Scenes are shot out of temporal order, multiple times, and different bits are picked from this camera and that camera. Without examining the analogy too closely, this is similar to how different git commits might be viewed. Once you have everything in the "can" (repository) you go back and in post-production, you edit and splice everything together to form individual cuts and scenes, sometimes perhaps even doing some digital editing of the resulting product.

git rebase -i, git add -p, and git reset -p can fix commits up in post-production by splitting different concepts, merging fixes to older commits, etc. See Post-Production Editing using Git also TopGit and StGit.

Be sure you do all of this work before doing any non-squashed merges (not rebases: merges) and before pushing. Your work becomes much more complex and/or impossible afterwards.

¹ The process of developing software, similar to the process of making sausage, is a messy messy business²; all sorts of stuff happens in the process of developing software. Bugs are inserted into the code, uncovered, patched over. The end result may be a tasty program, but anyone looking at the process of how it was created (through inspection of the commits) may end up with an sour taste in their mouth. If you hide the sausage making, you can create a beautiful looking history where each step looks as delicious as the end-product. Back to footnote reference.

² If you do not understand why someone would want to hide the sausage making, and you enjoy eating sausage, never, ever, watch sausages being made, read "The Jungle", or otherwise try to expose yourself to any part of the sausage making process. You will lead a much tastier (and perhaps shorter) life in your blissful ignorance.

Do keep up to date

This section has some overlap with workflow. Exactly how and when you update your branches and repositories is very much associated with the desired workflow. Also I will note that not everyone agrees with these ideas (but they should!)

Do periodic maintenance

The first two items should be run on your server repositories as well as your user repositories.

Do enforce standards

Having standards is a best practice and will improve the quality of your commits, code-base, and probably enhance git-bisect and archeology functionality, but what is the use of a standard if people ignore them? Checks could involve regression tests, compilation tests, syntax/lint checkers, commit message analysis, etc. Of course, there are times when standards get in the way of doing work, so provide some method to temporarily disable the checks when appropriate.

Traditionally, and in some people's views ideally, you would enforce the checks on the client side in a pre-commit hook (perhaps have a directory of standard hooks in your repo and might ask users to install them) but since users will often not install said hooks, you also need to enforce the standards on the server side. Additionally, if you follow the commit-early-and-often-and-perfect-it-later philosophy that is promoted in this document, initial commits may not satisfy the hooks.

Enforcing standards in a update hook on the server allows you to reject commits that don't follow the standards. You can also chide the user for not using the standard client-side hook to begin with (if you recommend that approach).

See Puppet Version Control for an example for a "Git Update Hook" and "Git Pre-Commit Hook" that enforces certain standards. Note that the update hook is examining files individually instead of providing whole-repository testing. Whether individual files can be tested in isolation for your standards or whether you need the whole repository (for instance, any language where one file can reference or include another might need whole repository checks) is of course a personal choice. The referenced examples are useful for ideas, anyway.

Do use useful tools

More than useful, use of these tools may help you form a best practice!

Do integrate with external tools

Increasing communication and decreasing friction and roadblocks to your developer's work will have many advantages. If you make something easy, convenient, and useful to do, people might just well do it.

Miscellaneous "Do"s

These are random best practices that are too minor or disconnected to go in any other section.

Miscellaneous "don't"s

In this list of things to not do, it is important to remember that there are legitimate reasons to do all of these. However, you should not attempt any of these things without understanding the potential negative effects of each and why they might be in a best practices "Don't" list.

DO NOT

Disclaimer

Information is not promised or guaranteed to be correct, current, or complete, and may be out of date and may contain technical inaccuracies or typographical errors. Any reliance on this material is at your own risk. No one assumes any responsibility (and everyone expressly disclaims responsibility) for updates to keep information current or to ensure the accuracy or completeness of any posted information. Accordingly, you should confirm the accuracy and completeness of all posted information before making any decision related to any and all matters described.

Copyright

Copyright ⓒ 2012 Seth Robertson

Creative Commons Attribution-ShareAlike 3.0 Generic (CC BY-SA 3.0) http://creativecommons.org/licenses/by-sa/3.0/

OR

GNU Free Documentation v1.3 with no Invariant, Front, or Back Cover texts. http://www.gnu.org/licenses/fdl.html

I would appreciate changes being sent back to me, being notified if this is used or highlighted in some special way, and links being maintained back to the authoritative source. Thanks.

Thanks

Thanks to the experts on #git, and my co-workers, for review, feedback, and ideas.

Comments

Comments and improvements welcome.

Use the github issue tracker or discuss with SethRobertson (and others) on #git