Reputation: 3657
When I run git clone SOMEREPO
from GitHub the file received are in us-ascii
:
$ file -bi index.php
text/plain; charset=us-ascii
How can I receive them in utf8
?
Thanks
Upvotes: 3
Views: 9807
Reputation: 1088
There are two ways to detect a file is utf-8: implicit or explicit. In implicit form, you have to look at the content and try to guess: remember utf-8 is a superset of ASC-II, so if you actually don't use any utf-8 special chars in a particular file, there's no way to distinguish if it's UTF-8 or ASC-II so guessing tools will assume ASC-II.
In explicit form, there's a BOM (byte-order marker) in the start of file, informing it's meant for UTF-8 usage. This was borrowed from USC-2/UTF-16 backgrounds, as there it was needed for informing both the encoding and the byte order. In UTF-8 there's no byte order (or it's agnostic, if you prefer).
The implicit form is the usual in Linux and almost every utf-8 compliant system, where the explicit form is not recommended. The exception (as usual) is Windows, where most editors can only guess if there's a BOM, as utf-8 is not entirely natively supported (as the usual forms are either a codepage or UCS-2 with slowly progressing towards UTF-16, UCS-2 being a poor subset of the former, lacking correct more-than-two-bytes-per-char compliance).
If you want a particular tool to assume utf-8 instead of ASC-II, you may have to either provide a BOM (explicit form) or configure it, or even change it. I.e. apache http servers may assume asc-ii by looking at file file content's but you can override settings to make it report utf-8 unconditionally (or the other way around).
Upvotes: 1
Reputation: 746
git clone will retrive the file "as is it" in the repository. If you want to work with uf8 you have to convert them.
Run that script in your root folder, and commit the changes. Note that git sometimes doesn't detect the encoding changes.
#!/bin/sh
find . -type f -print | while read f; do
mv -i "$f" "$f.recode.$$"
iconv -f us-ascii -t utf-8 < "$f.recode.$$" > "$f"
rm -f "$f.recode.$$"
done
If your editor is encoding your file in us-ascii, you can probably change it. If not, there is the solution of letting git encode your file in utf8 before each commit.
For that part you can look here : https://stackoverflow.com/a/11053818/3445619
Upvotes: 1