No More Waiting With Git Worktrees

On my current project, Git worktrees have become an essential part of my workflow to run test suites in parallel. Nevertheless, when I look at my colleagues, I think worktrees are still not used as much as they could be. In this post I will shed some light on when I use worktree (and when not), how to think of worktrees, and how I use them in practice.

The code in this post has been executed at build time of this website with the following version of Git:

git --version
git version 2.50.1

Worktrees or stashes?

To start learning more about Git worktrees, the official Git documentation is a great starting point. The documentation helpfully provides an example of when you might want to use them:

You are in the middle of a refactoring session and your boss comes in and demands that you fix something immediately. You might typically use git-stash[1]​ to store your changes away temporarily, however, your working tree is in such a state of disarray (with new, moved, and removed files, and other bits and pieces strewn around) that you don’t want to risk disturbing any of it. Instead, you create a temporary linked worktree to make the emergency fix, remove it when done, and then resume your earlier refactoring session.

Although the example uses some technical terms, the general gist should be clear if we take a worktree or working tree to mean a directory you use for coding, staging and committing your changes. A linked worktree is then the equivalent of a second working directory, containing a copy of the code at a commit of your choosing. We will clarify these terms when we dive into the details, but for now this is a fine working definition.

Unfortunately though, the example does not clarify what is meant with a “state of disarray”. I will admit, it does not convince me on why you should use worktrees. Most of the time I actually prefer stashing my changes and switching to a different branch. In fact, the only time I use linked worktrees, is when stashing would not solve my problem. Of course, this begs the question, which problem do worktrees solve, that stashing cannot?

When a single directory is not enough

When working on a data engineering project, it is generally a good practice to have a suite of end-to-end tests. Even though end-to-end tests should preferably be run in CI, it can still be a good idea to first run the test suite locally to avoid unnecessarily blocking a runner if your test would have failed anyway. Indeed, such an end-to-end test suite will often take a while to run, as it has to push a substantial amount of data through your pipelines to cover most of the happy path. At my current project, running the full end-to-end test suite takes around an hour. Because the test suite takes such a long time to finish, we have also set it up such that it writes its output to disk.

Due to this persistence to disk and the long execution time, I am effectively blocked from making changes in my working tree while the test is running. If in the meantime a colleague requests a code review and I want to run the test suite for their branch, I now have to choose between stopping the already running test suite and putting the code review on hold. Neither of these options is ideal.

The solution to this problem is to create a second directory with the code from my colleague's branch and without the output of the already running test suite. This is precisely what a linked worktree is!

Of course, long-running test jobs are only one particular example. The same line of reasoning holds for any long-running job that could block your initial directory for a long time, compilation being another such example. In other words,

Git worktrees allow me to cleanly run long-running jobs in parallel.

For this use case, I sometimes like to think of this workflow as similar to cloning a repository multiple times, but without any of the downsides of trying to keep multiple clones in sync.

Working trees and repositories

Before we explain what a worktree is and how the git worktree commands operate, let us briefly go over what the terms “working trees” and “repositories” mean and how they relate to each other. When we initialize (or clone) a non-bare repository, Git creates both a repository and a working tree:

git init /tmp/project
Initialized empty Git repository in /tmp/project/.git/

The directory /tmp/project is known as the working tree and can roughly speaking be thought of as a workspace that allows you to stage and commit your files. The /tmp/project/.git directory is called the repository and is fully managed by Git. It contains all the data and metadata Git requires for its version control capabilities. Although you might colloquially call /tmp/project the repository, it is important to keep in mind that the repository is actually the .git directory it contains.

Let us create a simple commit and sketch the structure so far:

cd /tmp/project
touch foo
git add foo
git commit -m "Add foo"
[main (root-commit) 1ad00df] Add foo
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 foo

default-init.svg

Figure 1: Project structure after initialization of a Git repository.

In this figure and in all the next ones, we will depict the repository in orange and the working tree(s) in green.

It is important to understand this is only the default setup for initializing a repository as even without resorting to worktrees, we can instruct Git to create a repository outside of the working tree, or to leave out the working tree altogether, by specifying additional initialization flags. Let us have a brief look at how this works.

Bare repositories

Bare repositories are mainly used on remote servers to host Git repositories to push and pull code to and from. As such, you do not typically directly work with a bare repository. Unlike a non-bare repository, a bare repository therefore is not initialized with an associated working tree (although it can have one).

A bare repository is initialized by using the --bare flag as follows:

git init --bare /tmp/project-2.git
Initialized empty Git repository in /tmp/project-2.git/

bare-repo.svg

Figure 2: A bare repository without any working trees.

If you are using a bare repository, it is common practice to name it after the project name, but to suffix it with .git.

Separate repository

Let us now consider the other example of separating the repository from its working tree. We can create a non-bare repository which is not contained in its associated working tree by using the --separate-git-dir configuration:

git init --separate-git-dir /tmp/project-3.git /tmp/project-3
Initialized empty Git repository in /tmp/project-3.git/

separate-git-dir.svg

Figure 3: A non-bare repository not contained in the initial working tree.

If the repository is not contained in the working tree, how is Git informed of the location of the working tree? You might assume the repository contains a pointer to the working tree, but it is actually the other way around! The working tree itself contains a .git file which points back to the repository:

cat /tmp/project-3/.git
gitdir: /tmp/project-3.git

This is why you can still use all Git commands in the working tree, even when it is separated from its repository.

I am not aware of a naming convention for repositories that are separate from their working tree, but I think it is good practice to follow the same naming scheme as for bare repositories. Just be aware that project-3.git does not refer to a bare repository in this case.

Adding more working trees

We can extend this idea with the git worktree commands: With Git worktrees, we can create any number of working trees for a repository at any location we desire. Adding working trees is done through the git worktree add command:

git worktree add ../project-worktree -b project-worktree
Preparing worktree (new branch 'project-worktree')
HEAD is now at 1ad00df Add foo

As a technical limitation, a branch can only be checked out in one working tree at a time. Upon creation of a working tree, we therefore also create a new branch using the -b flag, circumventing this limitation.

So far we have loosely used the terms worktree and working tree interchangeably, but there is actually a slight difference: To keep track of different working trees, Git uses some metadata, and it is the combination of this metadata and the working tree that is known as the worktree. The initially created worktree is the main worktree and any worktrees added with the git worktree add command are called linked worktrees.

Similar to the case of a working tree separated from its repository, linked worktrees do not contain a .git directory, but a .git file. This file points back to the worktrees subdirectory of the repository:

cd ../project-worktree
cat .git
gitdir: /tmp/project/.git/worktrees/project-worktree

Because of this pointer, we can interact with the repository from our linked worktree /tmp/project-worktree with the exact same commands as from our main worktree. For instance, let us add and commit a file bar and sketch the structure of the repository and its working trees:

touch bar
git add bar
git commit -m "Add bar"
[project-worktree d2b5a23] Add bar
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 bar

project-worktree.svg

Figure 4: A repository with a main worktree and a single linked worktree.

Git provides commands to list all worktrees associated to a repository. The main worktree is listed first:

git worktree list
/tmp/project           1ad00df [main]
/tmp/project-worktree  d2b5a23 [project-worktree]

If you do not need a worktree anymore, you can remove it as follows:

git worktree remove project-worktree

By default, Git tries to ensure you do not lose any data when you remove a worktree. As such, Git will refuse to remove the main worktree or any linked worktree that contains modified or untracked files.

How I structure my worktrees

Keeping track of multiple worktrees causes a bit of extra overhead, so if you do start using them, I would recommend keeping some structure to them. Personally, I like to create worktrees either for a long-lived branch, such as a development branch, or for a recurring type of work like developing, bug fixing or code reviewing. In both of these cases, I like to use specific naming conventions to keep track of them.

If I want to follow the development branch dev of project, I would create a worktree called project@dev. For a worktree used for a specific workflow, I would use a +-sign instead. For instance, the worktree for code reviews would be called project+review. In both cases, I keep these linked worktrees in the same directory as the main worktree. The main worktree retains its original name project and I often configure it to simply track the main branch of my repository.

These conventions allow me to start with only the main worktree and add and remove linked worktrees as I feel necessary. Any working tree containing a + or @ in its name is a linked worktree and can safely deleted without permanently corrupting or deleting my repository. This works even if I accidentally delete the directory instead of using git worktree remove.

Portable worktrees

To round off this post, I would like to share an interesting discovery I made while reading through the worktree documentation:

If the working tree for a linked worktree is stored on a portable device or network share which is not always mounted, you can prevent its administrative files from being pruned by issuing the git worktree lock command, optionally specifying --reason to explain why the worktree is locked.

Why would you want to use this? One reason I can think of, is that this can be useful when you are working in a memory constrained environment. Since linked worktrees (similar to repositories with separate working trees) only contain a pointer to the Git repository, you could, for example, create them on an embedded system. In this way you could still use Git to develop, but without the memory footprint of storing the entire Git repository. This is precisely the use case I came across in Developing for CircuitPython with git-worktree. It is a nice trick to have in your toolbox and a good example of the value of clear documentation!