Reputation: 1197
Using dplyr’s “verbs,” how can I apply a (general) function to a column of an R data frame, if that function depends on multiple columns of the data frame?
Here’s a concrete example of the type of situation that I face. I have a data frame like this:
df <- data.frame(
d1 = c('2016-01-30 08:40:00 UTC', '2016-03-06 09:30:00 UTC'),
d2 = c('2016-01-30 16:20:00 UTC', '2016-03-06 13:20:00 UTC'),
tz = c('America/Los_Angeles', 'America/Chicago'), stringsAsFactors = FALSE)
I want to convert the UTC times to local times, to get a data frame like this:
d1 d2 tz
1 2016-01-30 00:40:00 2016-01-30 08:20:00 America/Los_Angeles
2 2016-03-06 03:30:00 2016-03-06 07:20:00 America/Chicago
To do this, I would like to apply the following function, which converts UTC time to local time using the lubridate library, to the date columns:
getLocTime <- function(d, tz) {
as.character(with_tz(ymd_hms(d), tz))
}
Using dplyr, it seems that the transformation
df %>% mutate(d1 = getLocTime(d1, tz), d2 = getLocTime(d2, tz))
should do the trick. However, it fails with the complaint Error in eval(expr, envir, enclos): invalid 'tz' value
.
The only way I've managed to do the conversion to local time is with the rather ungainly assignment
df[c('d1', 'd2')] <- lapply(c('d1', 'd2'),
function(x) unlist(Map(getLocTime, df[[x]], df$tz)))
Is there in fact a natural way to perform this transformation using dplyr idioms?
Upvotes: 0
Views: 1586
Reputation: 20399
As mentioned by lukeA, the problem occurs because getLocTime
is not vectorized. So either you vectorize the function as proposed, or you perform your function rowwise:
df %>% rowwise() %>% mutate(d1 = getLocTime(d1, tz), d2 = getLocTime(d2, tz))
which makes sure that getLocTime
is called with a single number and not a vector. I leave it up to you to determine which approach is faster.
Upvotes: 3