Reputation: 31
I have some data frame with event records with Start Times and End Times. I want to expand the records into multiple records of consistant time intervals, let's say hour long intervals. For example, lets say the data frame contains two records:
EventId Day StartTime EndTime
1 Mon 1 3
2 Tues 2 5
My desired new data frame should look like this
EventId Day Time
1 Mon 1
1 Mon 2
2 Tues 2
2 Tues 3
2 Tues 4
My function uses data.table::setDT to expand the records like this:
makeIncrementalRecords <- function(df) {
new <- data.table::setDT(df)[,
.(Time = seq(StartTime,
EndTime,
by = 1)),
by = .(EventId, Day)]
Executing this line by line, I have no issues, and I get the result I want. When I execute the function as part of a loaded package, I get the following error...
Error in x[j]: invalid subscript type 'list'
I am completely baffled as to why this code would suddenly stop working when executed as a function. I guess it has something to do with the local function environment. It works when I create the data frame as 'df' in the global environment and just execute the setDT function in the console.
Any suggestions?
Thanks
Upvotes: 0
Views: 210
Reputation: 8120
Here is a solution similar to that proposed in the comments.
library(tidyverse)
makeIncrementalRecords <- function(data){
data %>%
mutate(Time = map2(StartTime, EndTime, ~seq(.x, .y-1))) %>%
unnest() %>%
select(EventId, Day, Time)
}
makeIncrementalRecords(df)
# EventId Day Time
# 1 1 Mon 1
# 2 1 Mon 2
# 3 2 Tues 2
# 4 2 Tues 3
# 5 2 Tues 4
Or if you want to keep it with data.table
makeIncrementalRecords2 <- function(data){
data.table::setDT(data)[, .(Time = seq(StartTime, EndTime-1, by = 1)), by = .(EventId, Day)]
}
makeIncrementalRecords2(df)
# EventId Day Time
# 1: 1 Mon 1
# 2: 1 Mon 2
# 3: 2 Tues 2
# 4: 2 Tues 3
# 5: 2 Tues 4
Upvotes: 0
Reputation: 416
EDIT
I think this result is what you are looking for.
structure(list(EventId = 1:2, Day = c("Mon", "Tues"), StartTime = 1:2, EndTime = c(3L, 5L)),
.Names = c("EventId", "Day", "StartTime", "EndTime"),
row.names = c(NA, -2L), class = "data.frame") -> test_df
library(dplyr)
library(tidyr)
generate_val <- function(startT, endT){
(seq(from = startT, to = (endT-1), by = 1))
}
test_df %>%
rowwise() %>%
do(new_vars = generate_val(.$StartTime, .$EndTime)) %>%
bind_cols(test_df %>% select(-c(StartTime:EndTime))) %>%
unnest()
# A tibble: 5 x 3
EventId Day new_vars
<int> <chr> <dbl>
1 1 Mon 1
2 1 Mon 2
3 2 Tues 2
4 2 Tues 3
5 2 Tues 4
To package this in a function you would have to follow the NSE
procedure described here - Programming with dplyr
If you don't mind using tidyr
, this should work. It might be a little slower than data.table
for large datasets (rows > 1 million).
library(tidyr)
test_df %>%
gather(., key = Time_type, value = Time, -EventId, -Day)
EventId Day Time_type Time
1 1 Mon StartTime 1
2 2 Tues StartTime 2
3 1 Mon EndTime 3
4 2 Tues EndTime 5
Upvotes: 0