nJGL
nJGL

Reputation: 864

How to treat encoding when reading .dta-files into R from Stata-files prior to version 14?

How can one dodge the encoding problems when reading Stata-data into R?

The dataset I wish to read is a .dta in either Stata 12 or Stata 13 (before Stata introduced support for utf-8 in version 14). Text-variables with Swedish and German letters å, ä, ö, ß, as well as other characters do not import well.

I have tried these answers, read.dta in foreign, the haven package (with no encoding-parameters), and now read_stata13, which informs me that it expects Stata files to be encoded in CP1252. But alas, the encoding doesn't work. Should I give up and and use a .csv-export as a bridge instead, or is it actually possible to read .dta-files in R?

Minimal example:
This code downloads the first few lines of my dataset, and illustrates the problem, for example in the variable vocation which contain Scandinavian languages.

setwd("~/Downloads/")
system("curl -O http://www.lilljegren.com/stackoverflow/example.stata13.dta", intern=F)

library(foreign)
?read_dta
df1 <- read_dta('example.stata13.dta', encoding="latin1")
df2 <- read_dta('example.stata13.dta', encoding="CP1252")
library(readstata13)
df3 <- read.dta13('example.stata13.dta', fromEncoding="latin1")
df4 <- read.dta13('example.stata13.dta', fromEncoding="CP1252")
df5 <- read.dta13('example.stata13.dta', fromEncoding="utf-8")

vocation <- c("Brandkorpral","Sömmerska","Jungfru","Timmerman","Skomakare","Skräddare","Föreståndare","Platsförsäljare","Sömmerska")
df4$vocation == vocation
# [1]  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE

Upvotes: 3

Views: 2848

Answers (1)

nJGL
nJGL

Reputation: 864

The correct encoding to read files generated by Stata prior to version 14 on Macs is "macroman"

df <- read.dta13('example.stata13.dta', fromEncoding="macroman")

On my Mac, both .dta-files in stata13 and stata12 formats (saved by saveold in Stata 13) imported nicely like this.

Supposedly, the manual of read_stata13, correctly assumes "CP1252" on other platforms. To me, "macroman", however, did the trick, (also for the .csv-files that Stata 13 generated with export delimited).

Upvotes: 4

Related Questions