NiklasvM
NiklasvM

Reputation: 15

Subsetting a R data.table with a function call: How to return a vector of results instead of just one result?

I am trying to subset a data.table based on a function that I wrote.

I have found out that R only calls my function once (instead of once per row). However, I am having trouple debugging this as I can only address Date in the context of the data.table.

For runtime purposes, I cannot use a for-loop to check each row individually. If this problem could be solved without a self-defined function, I would be happy, too.

library("data.table")
library("zoo")

dt <- data.table(Date = as.Date(1:10),
                 Sales = c(rep(0, 5), rep(1, 5)))
anySalesLastWeek <- function(date) {
  return(!empty(dt[as.integer(Date - as.Date(date)) %in% -7:-1 & Sales > 0, ]))
}
dt[anySalesLastWeek(Date), ]

I expected R to call the function anySalesLastWeek() once per row with the column value of Date in that row. Instead, the function seems to be called only once with dt$Date as the input.

The expected output would be the first six rows.

Upvotes: 1

Views: 107

Answers (1)

chinsoon12
chinsoon12

Reputation: 25225

Not sure if I understood it correctly. Here is an approach using non-equi join:

DT[, oneweekago := Date - 7]
DT[, anySalesLastWeek := 
    DT[DT, on=.(Date>=oneweekago, Date<Date), allow.cartesian=TRUE, 
        sum(Sales, na.rm=TRUE) > 0, by=.EACHI]$V1
]

output:

          Date Sales oneweekago anySalesLastWeek
 1: 2019-10-04     0 2019-09-27            FALSE
 2: 2019-10-05     0 2019-09-28            FALSE
 3: 2019-10-06     0 2019-09-29            FALSE
 4: 2019-10-07     0 2019-09-30            FALSE
 5: 2019-10-08     0 2019-10-01            FALSE
 6: 2019-10-09     1 2019-10-02            FALSE
 7: 2019-10-10     1 2019-10-03             TRUE
 8: 2019-10-11     1 2019-10-04             TRUE
 9: 2019-10-12     1 2019-10-05             TRUE
10: 2019-10-13     1 2019-10-06             TRUE

data:

library("data.table")
DT <- data.table(Date=seq(Sys.Date(), by="1 day", length.out=10), Sales=c(rep(0, 5), rep(1, 5)))

#          Date Sales
# 1: 2019-10-04     0
# 2: 2019-10-05     0
# 3: 2019-10-06     0
# 4: 2019-10-07     0
# 5: 2019-10-08     0
# 6: 2019-10-09     1
# 7: 2019-10-10     1
# 8: 2019-10-11     1
# 9: 2019-10-12     1
#10: 2019-10-13     1

edit: regarding OP's function and also from Rohit comment, you can fix it as follows:

anySalesLastWeek <- function(date) {
    DT[as.integer(Date - as.Date(date)) %in% -7:-1 & Sales > 0, .N>0]
}
DT[sapply(Date, anySalesLastWeek)]

Upvotes: 2

Related Questions