stalefriedrice
stalefriedrice

Reputation: 29

How to convert integer to date format in R?

I am trying to convert integer data from my data frame in R, to date format.

The data is under column named svcg_cycle within orig_svcg_filtered data frame.

The original data looking something like 200502, 200503, and so forth, and I expect to turn it into yyyy-mm-dd format.

I am trying to use this code:

as.Date(orig_svcg_filtered$svcg_cycle, origin = "2000-01-01")

but the output is not something that I expected:

[1] "2548-12-15" "2548-12-15" "2548-12-15" "2548-12-15" "2548-12-15" 

while it is supposed to be 2005-02-01, 2005-03-01, and so forth.

How to solve this?

Upvotes: 0

Views: 12773

Answers (2)

Mako212
Mako212

Reputation: 7292

I like to use Regex to fix these kinds of string formatting issues. as.Date by default only checks for several standard date formats like YYYY-MM-DD. origin is used when you have an integer date (i.e. seconds from some reference point), but in this case your date is actually not an integer date, rather it's just a date formatted as a string of integers.

We simply split the month and day with a dash, and add a day, in this case the first of the month, to make it a valid date (you must have a day to store it as a date object in R). The Regex bit captures the first 4 digits in group one and final two digits in group two. We then combine the two groups, separated by dashes, along with the day.

as.Date(gsub("^(\\d{4})(\\d{2})", "\\1-\\2-01", x))

[1] "2005-02-01" "2005-03-01"

You don't need to specify format in this case, because YYYY-MM-DD is one of the standard formats as.Date checks, however, the format argument is format = "%Y-%m-%d"

Upvotes: 0

duckmayr
duckmayr

Reputation: 16920

If you have

x <- c(200502, 200503)

Then

as.Date(x, origin = "2000-01-01")

tells R you want the days 200,502 and 200,503 days after 2000-01-01. From help("as.Date"):

as.Date will accept numeric data (the number of days since an epoch), but only if origin is supplied.

So, integer data gives days after the origin supplied, not some sort of numeric code for the dates like 200502 for "2005-02-01".

What you want is

as.Date(paste(substr(x, 1, 4), substr(x, 5, 6), "01", sep = "-"))

# [1] "2005-02-01" "2005-03-01"

The

paste(substr(x, 1, 4), substr(x, 5, 6), "01", sep = "-")

part takes your integers and creates strings like

# [1] "2005-02-01" "2005-03-01"

Then as.Date() knows how to deal with them.

You could alternatively do something like

as.Date(paste0(x, "01"), format = "%Y%m%d")

# [1] "2005-02-01" "2005-03-01"

This just pastes on an "01" to each element (for the day), converts to character, and tells as.Date() what format to read the date into. (See help("as.Date") and help("strptime")).

Upvotes: 4

Related Questions