Captain_Obvious
Captain_Obvious

Reputation: 540

Why does lubridate's parse_date_time work with lapply, but fail with sapply?

Given: the following 4x2 dataframe

df <- as.data.frame(
  stringsAsFactors = FALSE,
  matrix(
    c("2014-01-13 12:08:02", "2014-01-13 12:19:46",
      "2014-01-14 09:59:09", "2014-01-14 10:05:09",
      "6-18-2016 17:43:42",  "6-18-2016 18:06:59",
      "6-27-2016 12:16:47",  "6-27-2016 12:29:05"),
    nrow = 4, ncol = 2, byrow = TRUE
  )
)
colnames(df) <- c("starttime", "stoptime")

Goal: the same dataframe but with all the values replaced by the return value of the following lubridate function call:

f <- function(column) {
  parse_date_time(column, orders = c ("ymd_hms", "mdy_hms"), tz = "ETZ")
}

Here's the sapply call, whose result contains strange integers:

df2 <- sapply(df, FUN = f) # has values like `1467030545`

And here's the lapply call, that works as expected:

df2 <- lapply(df, FUN = f) # has values like `2016-06-27 12:29:05`

I understand sapply returns the simplest data structure it can while lapply returns a list. I was prepared to follow up the sapply call with df2 <- data.frame(df2) to end up with a data frame as desired. My question is:

Why does the parse_date_time function behave as expected in the lapply but not in the sapply?

Upvotes: 4

Views: 592

Answers (1)

akrun
akrun

Reputation: 887531

The reason is that sapply have by default simplify = TRUE and when the length or dimension of the list elements are same, it simplifies to a vector or matrix. Internally, Date time classes are stored as numeric,

typeof(parse_date_time(df$starttime, orders = c("ymd_hms", "mdy_hms"), tz = "ETZ"))
#[1] "double"

while the class is 'POSIXct`

class(parse_date_time(df$starttime, orders = c("ymd_hms", "mdy_hms"), tz = "ETZ"))
#[1] "POSIXct" "POSIXt"  

so it coerces to that while doing the matrix conversion, while in the list it preserves the class format.

If we are interested in a data.frame, then we create a copy of 'df' and use [] to get the same structure as 'df'

df2 <- df
df2[] <-  lapply(df, FUN = function(column) {
     parse_date_time(column, orders = c("ymd_hms", "mdy_hms"), tz = "ETZ")
   })

df2
#           starttime            stoptime
#1 2014-01-13 12:08:02 2014-01-13 12:19:46
#2 2014-01-14 09:59:09 2014-01-14 10:05:09
#3 2016-06-18 17:43:42 2016-06-18 18:06:59
#4 2016-06-27 12:16:47 2016-06-27 12:29:05

Upvotes: 5

Related Questions