Reputation: 864
How can one dodge the encoding problems when reading Stata-data into R?
The dataset I wish to read is a .dta in either Stata 12 or Stata 13 (before Stata introduced support for utf-8 in version 14). Text-variables with Swedish and German letters å, ä, ö, ß, as well as other characters do not import well.
I have tried these answers, read.dta
in foreign
, the haven
package (with no encoding-parameters), and now read_stata13
, which informs me that it expects Stata files to be encoded in CP1252. But alas, the encoding doesn't work. Should I give up and and use a .csv-export as a bridge instead, or is it actually possible to read .dta-files in R?
Minimal example:
This code downloads the first few lines of my dataset, and illustrates the problem, for example in the variable vocation
which contain Scandinavian languages.
setwd("~/Downloads/")
system("curl -O http://www.lilljegren.com/stackoverflow/example.stata13.dta", intern=F)
library(foreign)
?read_dta
df1 <- read_dta('example.stata13.dta', encoding="latin1")
df2 <- read_dta('example.stata13.dta', encoding="CP1252")
library(readstata13)
df3 <- read.dta13('example.stata13.dta', fromEncoding="latin1")
df4 <- read.dta13('example.stata13.dta', fromEncoding="CP1252")
df5 <- read.dta13('example.stata13.dta', fromEncoding="utf-8")
vocation <- c("Brandkorpral","Sömmerska","Jungfru","Timmerman","Skomakare","Skräddare","Föreståndare","Platsförsäljare","Sömmerska")
df4$vocation == vocation
# [1] TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
Upvotes: 3
Views: 2848
Reputation: 864
The correct encoding to read files generated by Stata prior to version 14 on Macs is "macroman"
df <- read.dta13('example.stata13.dta', fromEncoding="macroman")
On my Mac, both .dta-files in stata13 and stata12 formats (saved by saveold
in Stata 13) imported nicely like this.
Supposedly, the manual of read_stata13
, correctly assumes "CP1252"
on other platforms. To me, "macroman"
, however, did the trick, (also for the .csv
-files that Stata 13 generated with export delimited
).
Upvotes: 4