Oscar Fanelli
Oscar Fanelli

Reputation: 3657

git clone in utf8 instead of us-ascii

When I run git clone SOMEREPO from GitHub the file received are in us-ascii:

$ file -bi index.php text/plain; charset=us-ascii

How can I receive them in utf8?

Thanks

Upvotes: 3

Views: 9807

Answers (2)

Alexandre Pereira Nunes
Alexandre Pereira Nunes

Reputation: 1088

There are two ways to detect a file is utf-8: implicit or explicit. In implicit form, you have to look at the content and try to guess: remember utf-8 is a superset of ASC-II, so if you actually don't use any utf-8 special chars in a particular file, there's no way to distinguish if it's UTF-8 or ASC-II so guessing tools will assume ASC-II.

In explicit form, there's a BOM (byte-order marker) in the start of file, informing it's meant for UTF-8 usage. This was borrowed from USC-2/UTF-16 backgrounds, as there it was needed for informing both the encoding and the byte order. In UTF-8 there's no byte order (or it's agnostic, if you prefer).

The implicit form is the usual in Linux and almost every utf-8 compliant system, where the explicit form is not recommended. The exception (as usual) is Windows, where most editors can only guess if there's a BOM, as utf-8 is not entirely natively supported (as the usual forms are either a codepage or UCS-2 with slowly progressing towards UTF-16, UCS-2 being a poor subset of the former, lacking correct more-than-two-bytes-per-char compliance).

If you want a particular tool to assume utf-8 instead of ASC-II, you may have to either provide a BOM (explicit form) or configure it, or even change it. I.e. apache http servers may assume asc-ii by looking at file file content's but you can override settings to make it report utf-8 unconditionally (or the other way around).

Upvotes: 1

Simon PA
Simon PA

Reputation: 746

git clone will retrive the file "as is it" in the repository. If you want to work with uf8 you have to convert them.

Run that script in your root folder, and commit the changes. Note that git sometimes doesn't detect the encoding changes.

#!/bin/sh

find . -type f -print | while read f; do
        mv -i "$f" "$f.recode.$$"
        iconv -f us-ascii -t utf-8 < "$f.recode.$$" > "$f"
        rm -f "$f.recode.$$"
done

If your editor is encoding your file in us-ascii, you can probably change it. If not, there is the solution of letting git encode your file in utf8 before each commit.

For that part you can look here : https://stackoverflow.com/a/11053818/3445619

Upvotes: 1

Related Questions