Angad Dubey
Angad Dubey

Reputation: 5452

invalid author/commiter line - bad email Bitbucket to github

I'm trying to move an app from bitbucket to github, but I keep getting this error:

remote: error: object 77397629d92d89204249029e019b72584b1598fc: badEmail: invalid author/committer line - bad email
remote: fatal: Error in object
error: pack-objects died of signal 13

I ran git fsck and get this result:

Checking object directories: 100% (256/256), done.
error in commit 77397629d92d89204249029e019b72584b1598fc: invalid author/committer line - bad email
Checking objects: 100% (29897/29897), done.
dangling commit d100575b2dadb8f6bc8008cb31c3d0655c9923de
dangling commit 4c053ba6555079d75d113809181a4513ed9335e5
dangling blob 08073e63069579f0cd2c4939db7500ca803f8295
dangling blob d3160dde572502bc4c72e109f61488948d7a2da9
dangling blob cb24bd924a453b549f822d3ecdd72f80c46df01c
dangling commit 2a253781da5caec8017f97d41b7c7ac55c5cf19b
dangling commit 06372ddb9c0022e4357a7d5b70b9599a82562704
dangling blob c4555c38113dc7ff25a296820a42de907d86b938
dangling blob 09665ec011df5fd9cfa4e28d9abb5f9ba560d712
dangling blob 2e6ab5601d4ee4a908a8a778c35e6c7b7a46e110
dangling blob d275b7173899fe7852d27b7fb60a93b1b295671c
dangling commit f39b41ca7fb7fbc504b8f093517ac05c35b9f077
dangling commit d7b4c781543b10c0b593e574835215e537a190a5
dangling blob f1bf967a4c28a4ee54d8012967d8313eb9cee1d9
dangling commit adc3d626e79c045ed976d6d1cb21b3880aed3b65
dangling blob aee1252e9e9770a89b77f21bf394b1d3feaf8238
dangling commit 55e459861ecc57ab2e67c9bd92edc0bc1dcbcb6a
dangling commit b7f0dcf6aa1a492588c056fe0804f4ac5223dc3e

How do I fix that commit data, or maybe get rid of it completely?

Upvotes: 1

Views: 1155

Answers (1)

torek
torek

Reputation: 488253

You cannot change any commit, ever. There is a path you can follow, but you will need to understand all the consequences of the "cannot change any commit".

A Git repository is, with a few minor but important additions, basically just a big database of Git objects. There are four types of objects in the database: commits, (annotated) tags, trees, and blobs. The blobs are your files. The trees provide Git with the names for the files, because these objects all have Git hashes as their actual names. The tags we can just ignore for the moment, and the commits are where your problem comes in.

Git object names are hashes

There's a key concept embedded in the paragraph above: the actual name for any Git object, and hence the "true name" of any commit object as well, is a Git hash. In your particular case, we are seeing this:

object 77397629d92d89204249029e019b72584b1598fc: badEmail:
invalid author/committer line

It's commit objects that have author and committer lines, so 7739762... is clearly a commit object, and equally clearly, some version of Git allowed that object but the version you're now using does not.

You cannot change any existing Git object

One of the key tricks that makes Git work at all is the fact that each object's name—the big ugly hash ID—is actually a cryptographic checksum (a message digest value) of the contents of the object. This provides several crucial properties, one of which is now working against you.

Specifically, if we want to fix a bad line in a commit, we have to make a copy of that commit with slightly different data in it. The checksum of the new (altered) commit will be different, and hence its hash will be different. This alone is not so bad—what's one hash among thousands or millions of hashes?—but it has a ripple of consequences.

Git commits contain hash IDs, so they form chains

Let's take a look at the first part of an actual commit:

$ git cat-file -p HEAD
tree a775288b86ae652ea163357939d852cdd927eed6
parent 36cafe44443fcca9eb35399ef0e9bfe289ec5dde

(the next two lines are "author" and "committer"; this is a commit in the Git repository for Git).

You can examine the bad commit in your own repository this same way: run git cat-file -p 77397629d92d89204249029e019b72584b1598fc. You'll see a "tree" line, probably one "parent", and the author and committer, plus the commit message itself. One or both of the author/committer lines will have an improperly formed email address.

You can copy this commit to a fixed version, or you can drop it entirely, but both of these approaches have problems. The problem with dropping it entirely seems obvious (you lose the commit itself) but in fact, it has the same problem as copying. Let's draw part of a commit graph, including the bad commit:

... <- o <- o <- B <- C <- o <- ...

Here each round o node represents a commit, with the regular o replaced with B for the particular bad commit, and C for its subsequent (child) commit. Each commit has the ID of its parent in it. What happens if we delete B entirely? We'll get:

... <- o <- o    ? <- C <- o <- ...

There's a hole! C points nowhere—it tries to point to B but we've somehow removed B—and now nobody points to the commit that came before B.

So, instead, let's copy from B, the bad commit, to G, a good commit, that has the same tree and parent lines and the same commit message, but a fixed-up author and/or committer:

... <- o <- o <- B <- C <- o <- ...
               \
                 G

That's better ... except C still points to B. Well, we can fix that! Let's copy C too, with its copy pointing to G instead of B:

... <- o <- o <- B <- C <- o <- ...
               \
                 G <- o

So far so good, but now the commit node after C still points to C, so we have to copy that as well, and so on down the line, all the way to the tip-most commit:

... <- o <- o <- B <- C <- o <- ...
               \
                 G <- o <- o <- ...

Now that we've copied (and fixed up) not only B, but every commit after B too, now we can abandon the original bad chain from B onward:

... <- o <- o <- B <- C <- o <- ...   (abandoned chain)
               \
                 G <- o <- o <- ...   (replacement chain)

This is in fact how git rebase works, so if bad commit B is contained in just one branch, we could use a rebase command (probably an interactive one) to get this result.

git filter-branch

There's a more powerful Git command, though, that does what rebase does—copy commits, then adjust the external branch names that point to the tips of branches—but does it over a much bigger range of commits, can handle the case where the bad commit(s) is/are on more than one branch, and has the ability to fix up tag references and tag objects as well. (The rebase command ignores tags entirely.) That command is git filter-branch.

One big problem with git filter-branch is that it is too powerful, and yet at the same time, not at all interactive. You must specify, when you run it, exactly what it is to do, on which branch(es), and how it should fix up tags, if it should fix them up at all.

It's also quite slow. In principle, the way git filter-branch works is to extract every original commit into a temporary tree, modify the temporary tree as desired, and then make a new, possibly-modified commit from the possibly-modified tree, with a possibly-modified commit message, author, committer, and so on—or even just skip the commit entirely, so as to effectively delete the original. As it copies each commit, it also sets up a mapping: old commit ID X maps to new commit ID Y. When it's finished copying all the old commits, filter-branch uses this map to adjust all your branch names (and, if requested, tag names and tag objects as well1) to use the new IDs.

To combat both the usability and slowness problems, git filter-branch has a dizzying array of filter options. In some ways, this just makes the usability problem worse, because you need to know a whole lot about the internals of Git to choose the correct filters, and to program them to do the correct thing to the correct commit(s).

In this particular case, the filter you want is the --env-filter:

--env-filter <command>

       This filter may be used if you only need to modify the environment in which the commit will be performed. Specifically, you might want to rewrite the author/committer name/email/time environment variables (see git-commit-tree(1) for details). Do not forget to re-export the variables.

There is an example in the documentation for fixing Git author and/or email. This is pretty much exactly what you need—you just need to look at the failing object to see what it has that's wrong, and modify the test in the example to test for that, rather than testing for root@localhost or whatever. You probably also want to use --tag-name-filter cat -- --all at the end of your filter operation, to adjust tags if needed. Read the documentation and search for "tag".

Caveats

Always remember two things when using filter-branch:

  • Work on a clone. If you break stuff, no problem! It's just a clone and you can throw it out and start over. This means you truly don't need to worry about breaking stuff: run the filter-branch operation, check it over carefully to see if it worked, and if not, throw out the clone and start over. It's slow, but completely safe.

  • When you're done, you have not just fixed a bad commit. Every commit after the bad one has been replaced with a copy. This means that everyone who has a clone of the original repository, must now re-base their own work onto the new copied repository, rather than the original repository. You are making extra work for everyone else. If you must replace or remove an old commit, there is no way around this. But it does mean you should consider any options you may have for leaving bad commits in place.


1Tag-object-copy actually happens during the commit-object-copy phase. In theory, external references (branch and tag names) could be made as soon as the targeted original object has its new mapped object, but it's easier, in code terms, to just collect up the mappings into a file as we go, and then adjust all the external references in a final pass. Tag objects get copied during the "object copy" pass because they are objects. Of course, they only get copied at all if you specify a --tag-name-filter in the first place.

Tag object copies also (necessarily) lose their GPG signatures, so if you don't need to copy any tag objects, you may want to skip this.

Upvotes: 1

Related Questions