Editing Commit History in Git/GitHub

I recently tweeted about an issue I was having with Git authors:

A downside of gamified services is that when they fail it can be _de_motivating. My new laptop's Git author was wrong (100% my fault 🤦‍♂️), so I'm missing a month of commits. There's no real consequence, but I somehow feel anxious (and angry!) about those white squares! #CodeNewbie
@theAdhocracy

I wasn't kidding about the anxiety and anger either; those white squares genuinely got to me in a way that was probably not healthy. Rather than accept their blank, colourless state though, I decided to look into how I could change my commit history, and it turns out that it's possible (if a little frowned upon)[1]. As a point of reference for future cock-ups, and because I found the official response on GitHub a little incomplete, I figured I'd record the steps that I took:

First, I'd recommend creating a new folder. It really doesn't matter where this is (so long as it hasn't got any subfolders that already contain .git files). Personally I went with a folder called git_clone in my base www folder, which I still use despite largely having moved away from WAMPServer and similar tools (it just keeps all my files in one place).

Once the folder exists, open up a new terminal. I've done this using the inbuilt panels in VS Code, a standard Windows Powershell terminal, and a Git Bash terminal, so it shouldn't matter what you use.

You then just need to follow the steps outlined by GitHub here: https://help.github.com/en/articles/changing-author-info

However, there are a few gotchas that I've run into. For example, what if you don't have the incorrect author details? I happened to know what my incorrect email was, but if you've never set up your local Author then it could be almost anything. Luckily, this is the easiest hurdle to overcome. Browse through your commit history on GitHub until you find a commit with the wrong author, then copy the commit ID. Back in the terminal, just use git show <em>unique-id-string</em> with that copied ID to see the author details and make a note of them.

The next one is the big one: getting the GitHub script to run on Windows. For some reason, every time I followed their instruction to copy/paste the main script, it just caused my terminal to run the first line and ignore the rest. My fix was to copy it to Notepad, edit the variables to be what I needed, and then just remove the first line:

git filter-branch --env-filter '
OLD_EMAIL="murray.adcock@wrongaddress.com"
CORRECT_NAME="Murray Adcock"
CORRECT_EMAIL="murray.adcock@rightaddress.com"
if [ "$GIT_COMMITTER_EMAIL" = "$OLD_EMAIL" ]
then
 export GIT_COMMITTER_NAME="$CORRECT_NAME"
 export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL"
fi
if [ "$GIT_AUTHOR_EMAIL" = "$OLD_EMAIL" ]
then
 export GIT_AUTHOR_NAME="$CORRECT_NAME"
 export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL"
fi
' --tag-name-filter cat -- --branches --tags

That ran fine for my account, but I've actually since done the same process with a colleague's Git history, and for him it kept throwing a fatal: bad revision error. It seems to occur after a space in the name, suggesting escape characters are being ignored, but I'm not sure why it would work for me (a person with a space in my name) and not him. The only way I could get around that error was to switch to a Git Bash terminal directly, rather than using Powershell like I'd done before.

The other issue I ran into for him was that his original (and incorrect) author details lacked an email address completely. Take one look at the above script and you can see that makes it pretty unusable. Anyway, I tried using the "default" Git setting of just filling that in with a lowercase version of the username, but it didn't seem to work. Luckily, it's a fairly simple script, so as long as the username itself exists you can just flip the variable around:

git filter-branch --env-filter '
OLD_NAME="Wrongname"
CORRECT_NAME="First Last"
CORRECT_EMAIL="first.last@rightaddress.com"
if [ "$GIT_COMMITTER_NAME" = "$OLD_NAME" ]
then
 export GIT_COMMITTER_NAME="$CORRECT_NAME"
 export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL"
fi
if [ "$GIT_AUTHOR_NAME" = "$OLD_NAME" ]
then
 export GIT_AUTHOR_NAME="$CORRECT_NAME"
 export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL"
fi
' --tag-name-filter cat -- --branches --tags

Oh, but that meant I was running the script twice, which Git isn't happy about because it's already created a backup. If you run into the same grumble, just amend the first line to force the command using -f like so:

git filter-branch -f --env-filter '

Having done all of the above and using git log to check that my changes had been done correctly, which turned out to be super worthwhile as it highlighted a second incorrect user profile existed that I must have been using in the past, I simply pushed it up as outlined in the GitHub steps.

Once you get to the stage of pushing it up you might hit one final gotcha: protected branches. We have our master branch protected, forcing us to use pull requests to push something to production which reduces the chance of accidents. That's fine (and probably should always be the case), until you come to something like this, where you want to update all branches at once. I'd recommend just checking first and temporarily removing any such rules, but if you only find it out at the point of seeing an error like this:

Git error that reads: remote rejected master, protected branch hooks declined.

...then don't despair! Just remove the rule on GitHub and rerun the command; Git's intelligent enough to only do the bits it failed at first time around, so it should be super quick.

Great, so everything is now back up on GitHub with the correct Git history and author details 🎉 But what happens when you launch a local clone of the repository? Well, a quick git status showed nothing out of the ordinary, but git pull fetched the entire rewritten history and promptly told me I was 217 commits ahead of origin. Gulp!

Luckily, a quick search online brought up a useful (as ever) Stack Overflow answer, which solved my final issue. To abbreviate it, this is the important part:

git reset --hard origin/<branch>

Armed with a new piece of Git knowledge, the final steps were just to checkout each branch locally and run the above command. That reset the `HEAD` and hey presto, we're done. Actually a fairly painless process, once you walk through the steps, and the result is a shiny GitHub account with over 300 "new" (read: correctly tracked and attributed) commits. Satisfying to say the least 😉

Footnotes