Jeffrey Schrad
Jeffrey Schrad

Reputation: 31

setDT, Error in x[j]: invalid subscript type 'list', when executed as part of a function

I have some data frame with event records with Start Times and End Times. I want to expand the records into multiple records of consistant time intervals, let's say hour long intervals. For example, lets say the data frame contains two records:

EventId   Day   StartTime   EndTime
1         Mon   1           3
2         Tues  2           5

My desired new data frame should look like this

EventId   Day    Time
1         Mon    1
1         Mon    2
2         Tues   2
2         Tues   3
2         Tues   4

My function uses data.table::setDT to expand the records like this:

makeIncrementalRecords <- function(df) {
    new <- data.table::setDT(df)[,
                                 .(Time = seq(StartTime,
                                              EndTime,
                                              by = 1)),
                                 by = .(EventId, Day)]

Executing this line by line, I have no issues, and I get the result I want. When I execute the function as part of a loaded package, I get the following error...

Error in x[j]: invalid subscript type 'list'

I am completely baffled as to why this code would suddenly stop working when executed as a function. I guess it has something to do with the local function environment. It works when I create the data frame as 'df' in the global environment and just execute the setDT function in the console.

Any suggestions?

Thanks

Upvotes: 0

Views: 210

Answers (2)

AndS.
AndS.

Reputation: 8120

Here is a solution similar to that proposed in the comments.

library(tidyverse)

makeIncrementalRecords <- function(data){
    data %>% 
        mutate(Time = map2(StartTime, EndTime, ~seq(.x, .y-1))) %>% 
        unnest() %>% 
        select(EventId, Day, Time)
}

makeIncrementalRecords(df)
#   EventId Day    Time
# 1       1 Mon       1
# 2       1 Mon       2
# 3       2 Tues      2
# 4       2 Tues      3
# 5       2 Tues      4

Or if you want to keep it with data.table

makeIncrementalRecords2 <- function(data){
    data.table::setDT(data)[, .(Time = seq(StartTime, EndTime-1, by = 1)), by = .(EventId, Day)]
}

makeIncrementalRecords2(df)
#    EventId  Day Time
# 1:       1  Mon    1
# 2:       1  Mon    2
# 3:       2 Tues    2
# 4:       2 Tues    3
# 5:       2 Tues    4

Upvotes: 0

Suhas Hegde
Suhas Hegde

Reputation: 416

EDIT

I think this result is what you are looking for.

structure(list(EventId = 1:2, Day = c("Mon", "Tues"), StartTime = 1:2, EndTime = c(3L, 5L)), 
          .Names = c("EventId", "Day", "StartTime", "EndTime"), 
          row.names = c(NA, -2L), class = "data.frame") -> test_df 


library(dplyr)
library(tidyr)

generate_val <- function(startT, endT){
  (seq(from = startT, to = (endT-1), by = 1))
}

test_df %>%
  rowwise() %>%
  do(new_vars = generate_val(.$StartTime, .$EndTime)) %>%
  bind_cols(test_df %>% select(-c(StartTime:EndTime))) %>%
  unnest()

# A tibble: 5 x 3
  EventId Day   new_vars
    <int> <chr>    <dbl>
1       1 Mon          1
2       1 Mon          2
3       2 Tues         2
4       2 Tues         3
5       2 Tues         4

To package this in a function you would have to follow the NSE procedure described here - Programming with dplyr


If you don't mind using tidyr, this should work. It might be a little slower than data.table for large datasets (rows > 1 million).

library(tidyr)

test_df %>%
  gather(., key = Time_type, value = Time, -EventId, -Day)

  EventId  Day Time_type Time
1       1  Mon StartTime    1
2       2 Tues StartTime    2
3       1  Mon   EndTime    3
4       2 Tues   EndTime    5

Upvotes: 0

Related Questions