Stefano Grillini
Stefano Grillini

Reputation: 25

Convert dates from Stata to R

I am having difficulty converting a vector of integers into dates.

I've imported a dataset from Stata using:

> dataire <- read.dta13("~/lcapm_ireland.dta", convert.factors = TRUE,
 generate.factors = FALSE, encoding = "UTF-8", fromEncoding = NULL, 
convert.underscore = FALSE, missing.type = FALSE, convert.dates = TRUE, 
replace.strl = TRUE, add.rownames = FALSE)

My date variable is a monthly time series starting on January 2000 and formatted as "2000-Jan".

Similarly to R, Stata handles dates as integers but in the latter January 1960 is origin zero for monthly dates. Thus, when importing the dataset into R, I end up with a vector of dates of the form:

> c(478, 479, 480, ...)

In addition, my date variable is:

> class(datem)
[1] "Date"

How can I use as.Date or other functions to transform the time-series of integers in a monthly date variable formatted as "%Y-%b"?

Upvotes: 2

Views: 4356

Answers (2)

spindash_st
spindash_st

Reputation: 442

This is simpler but you will get a date with day, 1990-03-01.

You have a column vector of integers, DATE_IN_MONTHS, that are months since the origin of time in Stata which is 1960-01-01. In R the origin of time is is 1970-01-01.

With package lubridate one simple changes the origin of time and then adds months:

db <- haven::read_dta('StataDatabase.dta') %>%
        dplyr::mutate(., DATE_IN_MONTHS = ymd("1960-01-01") + months(DATE_IN_MONTHS))

Now db$DATE_IN_MONTHS contains c(1990-03-01, 1990-04-01, 1990-05-01,...) where each element is a date in R.

Upvotes: 2

user8682794
user8682794

Reputation:

The short answer is that you can't get exactly what you want. This is because in R, dates with numeric form must include a day.

For successfully importing a Stata date in R, you first can convert the respective variable in Stata from a monthly to a date-time one:

clear
set obs 1

generate date = monthly("2000-Jan", "YM")

display %tmCCYY-Mon date
2000-Jan

display date
480

replace date = dofm(date)

display %tdCCYY-Mon date
2000-Jan

display date
14610

replace date = cofd(date) + tc(00:00:35)

display %tc date
01jan2000 00:01:40

display %15.0f date
1262304100352

Then in R you can do the following:

statadatetime <-  1262304100352

rdatetime <- as.POSIXct(statadatetime/1000, origin = "1960-01-01")
rdatetime
[1] "2000-01-01 02:01:40 EET"

typeof(rdatetime)
[1] "double"

rdate <- as.Date(rdatetime)
rdate
[1] "2000-01-01"

typeof(rdate)
[1] "double"

You can get the Year-(abbreviated) Month form you want with the following:

rdate = format(rdate,"%Y-%b")
[1] "2000-Jan"

typeof(rdate)
[1] "character"

However, as you can see, this will change the type of rdate holding the date.

Trying to change it back you get:

rdate <- as.Date(rdate)
Error in charToDate(x) : 
  character string is not in a standard unambiguous format

Upvotes: 2

Related Questions