Reputation: 159
Is there a way in Git to rewrite the commit encoding
header? I have some commits with an author name
which has ISO-8859-1
encoded name but the commit encoding header is empty, which defaults to UTF-8
. This leads some applications to error (eg. Gitlab) on decoding the commit. Same applies to some commit messages.
Some ideas?
Upvotes: 2
Views: 4044
Reputation: 21
If the commit messages are really messed up and Sascha's solution does not work (because file -b --mime-encoding -
does not tell the truth), one can use the following to get rid of every character in the commit messages that is not ASCII:
git filter-branch --msg-filter '
perl -pe 's/[^[:ascii:]]//g;'
' HEAD
Clearly, this is far from being perfect since it kills every kind of non-english characters like umlauts, but in some circumstances (i.e. a git-repo that results from old cvs-repo with terribly encoded commit messages) it might be the only automatic solution.
Upvotes: 2
Reputation: 159
Solved it this way:
$ git filter-branch -f --commit-filter '
author_type=$( echo $GIT_AUTHOR_NAME | file -b --mime-encoding - )
author=$( echo $GIT_AUTHOR_NAME | iconv -f $author_type -t UTF-8 )
GIT_AUTHOR_NAME=$author
committer_type=$( echo $GIT_COMMITTER_NAME | file -b --mime-encoding - )
committer=$( echo $GIT_COMMITTER_NAME | iconv -f $committer_type -t UTF-8 )
GIT_COMMITTER_NAME=$committer
git commit-tree "$@";' --msg-filter '
cat > .commitmsg
type=$(cat .commitmsg|file -b --mime-encoding -)
cat .commitmsg|iconv -f $type -t UTF-8
' HEAD
$ rm -f .commitmsg
Upvotes: 3
Reputation: 489618
Testing (with git 2.2.0) shows that git commit
adds encoding <blah>
to the commit headers whenever you do a new commit with i18n.commitencoding = blah
. This includes "amended" commits—which are just new commits whose parent(s) is/are HEAD
's parent(s)—so, given an existing commit that you wish to mark that is at HEAD
, simply run git commit --amend
and exit the editor to write a new (different) HEAD
commit with the additional header line.
I did not test git rebase -i
but since that runs actual cherry-pick operations, and edit
mode allows you to use git commit --amend
to make a new HEAD
commit, it will certainly work. The mechanics may not be the prettiest.
To see a raw commit (including its encoding line) use git cat-file -p HEAD
(or some other commit-ID in place of HEAD
).
(As eis notes in a comment, it's probably better to use UTF-8 in the first place. You can of course do this as the amend process, although it may—or may not—be tricky depending on your editor.)
Upvotes: 1