matt_jay
matt_jay

Reputation: 1271

Converting date in seconds since origin in R

I have a series of dates that appear to be defined in seoncds since Jan 1, 1960.

'data.frame':   5 obs. of  1 variable:
$ original: int  1624086000 1624086000 1508137200 1508137200 1508137200

(for reproduction:)

data <- as.data.frame(c(1624086000,1624086000,1508137200,1508137200,1508137200))
setnames(data, c("original"))

I would like to convert these to dates in the format %Y-%m-%d.

I wrote the following code for this:

uniqueDates <- as.data.frame(unique(data))

uniqueDates$converted <- sapply(uniqueDates$original, function(x) as.Date(as.POSIXct(x, origin="1960-01-01", tz = "GMT"), "GMT", "%Y-%m-%d"))

The result are dates in a five-digit numeric format:

> str(uniqueDates$converted)
num [1:2] 15144 13802

If I just run

as.Date(as.POSIXct(1624086000, origin="1960-01-01", tz = "GMT"), "GMT", "%Y-%m-%d")

I get the desired result:

[1] "2011-06-19"

What am I doing wrong that results in the five-digits numeric type values instead of the date objects?

Upvotes: 0

Views: 1330

Answers (1)

Pierre L
Pierre L

Reputation: 28461

as.Date(as.POSIXct(data[,1], origin="1960-01-01", tz = "GMT"), "GMT", "%Y-%m-%d")
[1] "2011-06-19" "2011-06-19" "2007-10-16" "2007-10-16" "2007-10-16"

The function is already vectorized. There is no need for the lapply function. Use the apply family if you have multiple columns of dates. If you want to avoid the long anonymous function, you can create the function first and use it in the way that works for your cases:

as.ymd <- function(x) {
  as.Date(as.POSIXct(x, origin="1960-01-01", tz = "GMT"), "GMT", "%Y-%m-%d")
}

So now with either a single vector or array with multiple dimensions, you can convert the dates for those cases:

data2 <- data.frame(c(1624086000,1624086000,1508137200,1508137200,1508137200), c(1624086000,1624086000,1508137200,1508137200,1508137200))
setnames(data2, c("original", "second"))

as.ymd(data2[,1])
[1] "2011-06-19" "2011-06-19" "2007-10-16" "2007-10-16" "2007-10-16"

data2[] <- lapply(data2, as.ymd)
data2
    original     second
1 2011-06-19 2011-06-19
2 2011-06-19 2011-06-19
3 2007-10-16 2007-10-16
4 2007-10-16 2007-10-16
5 2007-10-16 2007-10-16

The five-digit numeric output from sapply is due to its simplification process. The dates are being converted to class numeric. Try adding the argument simplify=FALSE to the first function that you tried for comparison.

You can work around it with strftime since it outputs vectors with the class character. With sapply there will not be any problem simplifying it, but then you're left with character strings instead of the chosen date classes (POSIXct, POSIXlt, Date, zoo, xts, ...).

Upvotes: 1

Related Questions