Reputation:
We are working in Stata with data created in R, that have been exported using haven
package. We stumbled upon an issue with variables that have a dot in the name. To replicate the problem, some minimal R code:
library("haven")
var.1 <- c(1,2,3)
var_2 <- c(1,2,3)
test_df <- employ.data <- data.frame(var.1, var_2)
str(test_df)
write_dta(test_df, "D:/test_df.dta")
Now, in Stata, when I do:
use "D:\test_df.dta"
d
First problem - I get an empty dataset. Second problem - we get variable name with a dot - which in Stata should be illegal. Therefore any command using directly the variable name like
drop var.1
returns an error:
factor variables and time-series operators not allowed
r(101);
What is causing such behaviour? Any solutions to this problem?
Upvotes: 2
Views: 809
Reputation: 38510
This will drop var.1
in Stata:
drop var?1
Here (as in Excel), ?
is used as a wildcard for a single character. (The regular expression equivalent to .
)
Unfortunately, this will also drop var_1
, if it exists.
I am not sure about the missing values when writing a .dta file with haven
. I am able to replicate this result in Stata 14.1 and haven
0.2.0.
However, using the read_dta
function from haven
,
temp2 <- read_dta("test_df.dta")
returns the data.frame. As an alternative to haven
, I have used the readstata13
package in the past without issues.
library(readstata13)
save.dta13(test_df, "testdf.dta")
While this code has the same variable names issue, it provided a .dta file that contained the correct values when read into Stata 14.1. There is a convert.underscore argument to save.dta13
, that is intended to remove non-valid characters in Stata variable names. I verified that it will work properly in this example for readstata13
for version 0.8.5, but had a bug in some earlier versions including version 0.8.2.
Upvotes: 4