Reputation: 791
I am using Stata 12. I have encountered the following problems. I am importing a bunch of .csv files to Stata using the insheet
command. The datasets may conclude Russian, Croatian, Turkish, etc. I think they are encoded in "UTF-8". In .csv files, they are correct. After I imported them into Stata, the original strings are incorrect and become the strange characters. Would you please help me with that? Does Stat-Transfer can solve the problems? Does it support .csv format?
For example,
the original file is like:
My code is like: insheet using name.csv, c n save name.dta,replace
The result is like:
And I have tried to adjust the script in the fonts option, which does not work.
Upvotes: 4
Views: 7067
Reputation: 854
Update Answer: As of version 14, all of Stata is Unicode aware. That is results, help files, do files, ado files, data labels, etc.
This does not help users limited to accessing versions of Stata before 14, but is one kind of solution. Using the OP's example:
. insheet using "/home/Alexis/Desktop/data.csv"
(3 vars, 4 obs)
. ed
. list
+------------------------------------------------------------------------------+
| v1 v2 v3 |
|------------------------------------------------------------------------------|
1. | RU00040778 RUS ПРAЙCBOTEРXAУCKУПEРC AУДИT |
2. | RU00044434 RUS КПMГ |
3. | RU00044428 RUS Эрнст энд Янг |
4. | RU00044428 RUS Аудиторско-консулбтационная группа Раэвитие Биэнес-систем |
+------------------------------------------------------------------------------+
Upvotes: 2
Reputation: 181
As @Nick Cox commented earlier, the problem is that Stata just doesn't support Unicode/UTF-8 encoding. No, StatTransfer wouldn't resolve the problem (please refer to this explanation).
You can do the trick using an online decoder or MS Word. Let's do it with one language first, say, Russian as in your screenshots. Check out the correct encodings for Croatian, Turkish, and other languages you have.
Depending on your OS, you might need to install all appropriate languages first.
Hope it helps.
Upvotes: 2