Sascha
Sascha

Reputation: 159

How to change a Git commit's encoding header?

Is there a way in Git to rewrite the commit encoding header? I have some commits with an author name which has ISO-8859-1 encoded name but the commit encoding header is empty, which defaults to UTF-8. This leads some applications to error (eg. Gitlab) on decoding the commit. Same applies to some commit messages.

Some ideas?

Upvotes: 2

Views: 4044

Answers (3)

raggi
raggi

Reputation: 21

If the commit messages are really messed up and Sascha's solution does not work (because file -b --mime-encoding - does not tell the truth), one can use the following to get rid of every character in the commit messages that is not ASCII:

git filter-branch --msg-filter '
  perl -pe 's/[^[:ascii:]]//g;'
  ' HEAD

Clearly, this is far from being perfect since it kills every kind of non-english characters like umlauts, but in some circumstances (i.e. a git-repo that results from old cvs-repo with terribly encoded commit messages) it might be the only automatic solution.

Upvotes: 2

Sascha
Sascha

Reputation: 159

Solved it this way:

$ git filter-branch -f --commit-filter '
author_type=$( echo $GIT_AUTHOR_NAME | file -b --mime-encoding - )
author=$( echo $GIT_AUTHOR_NAME | iconv -f $author_type -t UTF-8 )
GIT_AUTHOR_NAME=$author

committer_type=$( echo $GIT_COMMITTER_NAME | file -b --mime-encoding - )
committer=$( echo $GIT_COMMITTER_NAME | iconv -f $committer_type -t UTF-8 )
GIT_COMMITTER_NAME=$committer

git commit-tree "$@";' --msg-filter '
cat > .commitmsg
type=$(cat .commitmsg|file -b --mime-encoding -)
cat .commitmsg|iconv -f $type -t UTF-8
' HEAD

$ rm -f .commitmsg

Upvotes: 3

torek
torek

Reputation: 489618

Testing (with git 2.2.0) shows that git commit adds encoding <blah> to the commit headers whenever you do a new commit with i18n.commitencoding = blah. This includes "amended" commits—which are just new commits whose parent(s) is/are HEAD's parent(s)—so, given an existing commit that you wish to mark that is at HEAD, simply run git commit --amend and exit the editor to write a new (different) HEAD commit with the additional header line.

I did not test git rebase -i but since that runs actual cherry-pick operations, and edit mode allows you to use git commit --amend to make a new HEAD commit, it will certainly work. The mechanics may not be the prettiest.

To see a raw commit (including its encoding line) use git cat-file -p HEAD (or some other commit-ID in place of HEAD).

(As eis notes in a comment, it's probably better to use UTF-8 in the first place. You can of course do this as the amend process, although it may—or may not—be tricky depending on your editor.)

Upvotes: 1

Related Questions