Git Filter Branch

This HOWTO describes the steps required to extract a portion of a large git repository into it's own separate repository.

I have found this useful numerous times because when I first started to use git I created a single repository for all of my personal projects. Since then I've come to appreciate having many smaller repositories. So whenever I want to open up some new chunk of code I use these steps to extract the history for just that chunk into a new repository and then publish that.

Setup

I always start by doing a fresh clone of the repository that I want to extract from, so that I know I'm not messing with my primary development environment. This has the added advantage that as long as you don't clone with --recursive, you won't have any submodules checked out. If you have submodules then the filter branch can get into trouble trouble.

git clone git://source.socialhacker.com/... temp_clone

Then just to make sure I'm not going to do something stupid, I remove the remote from the newly cloned repository.

cd temp_clone
git remote rm origin

Now we're ready to run git filter-branch. In particular, I run the subdirectory filter which replays all of your commits, only keeping changes to a given subdirectory. You need to add the --prune-empty option to cause filter-branch to not include empty commits.

git filter-branch \
    --prune-empty \
    --subdirectory-filter <directory> \
    -- \
    --all

Now, at this point I like to fix up the history a little. In particular I fix up the commit messages with the name and email address I've decided on. For a while I hadn't set these correctly and my early commits have something pretty useless. This will only work if you're the only committer to your repository. If you have multiple people committing, then this will wipe out their author names and email addresses from their commits.

git filter-branch \
    -f \
    --env-filter "export GIT_AUTHOR_NAME='Anton Staaf'; export GIT_AUTHOR_EMAIL='anton@socialhacker.com';" \
    HEAD

git filter-branch \
    -f \
    --env-filter "export GIT_COMMITTER_NAME='Anton Staaf'; export GIT_COMMITTER_EMAIL='anton@socialhacker.com';" \
    HEAD

The final step before you can push your new repository is to remove all of the old information about the commits that you no longer want to be visible. The first line below clears out the reflog, so that it doesn't maintain references to the old state of the repository. The second line does a garbage collection run on the repository. This will remove any objects that are no longer referenced.

git reflog expire --expire=now --all
git gc --aggressive --prune=now

At this point you can add a new remote and push your repository.