emagar
emagar

Reputation: 1044

Edit text by GitHub users using different character encodings

(I remember reading a workaround to this issue a while back, but can't find the post!)

I interact with research assistants through github. I work on a linux machine, others in mac or windows.

Special characters usually raise problems when editing text. Is there a way to set the repository's character encoding to UTF-8 (as my machine) while letting others rely on their native character encoding when editing files? (This suggests that not all is straighforward, I am always puzzled by character encoding issues...)

Is there a protocol to prevent messing up text when pulling their work?

Thank you!

Upvotes: 1

Views: 92

Answers (2)

bk2204
bk2204

Reputation: 76559

If you have text files, Git generally assumes them to be in UTF-8 unless otherwise specified. You can use other encodings, but things like git diff may or may not function as you expect if you use other encodings in the repository.

Having said that, it can be useful for people using other systems, most notably Windows, to use other encodings in the working tree. Unfortunately, even in 2020, there are still many Windows programs which fail to work with UTF-8 and are capable of operating only in little-endian UTF-16 with BOM. Allowing folks to specify their own working-tree-encoding settings on their own systems can be helpful for those people. If your project is using a language which is required to be in UTF-8, such as Go or Rust, then this is probably a non-issue, because tools which use UTF-16 just don't work.

You can specify a working-tree-encoding=UTF-8 in your repository, which will cause Git to perform a no-op encoding and fail if the input is not UTF-8. That will make it slightly slower, but it will protect you from users who would otherwise put invalid content into the codebase (assuming that invalid content isn't valid UTF-8). Users can still override that setting in .git/info/attributes if they like.

You may also wish to set an .editorconfig file which tells users' editors that the files are to be in UTF-8. Some editors have native support for EditorConfig files and some require a plugin, but it's an option. Note that if you did that, people would have a harder time overriding the encoding on their personal systems if they wanted to.

Upvotes: 1

Prakash Kumar
Prakash Kumar

Reputation: 11

I am not sure but what I can suggest is to set in the global config and another way which I can think of is by creating the custom git hook.

Upvotes: 1

Related Questions