Reputation: 15
I am trying to subset a data.table based on a function that I wrote.
I have found out that R only calls my function once (instead of once per row). However, I am having trouple debugging this as I can only address Date in the context of the data.table.
For runtime purposes, I cannot use a for-loop to check each row individually. If this problem could be solved without a self-defined function, I would be happy, too.
library("data.table")
library("zoo")
dt <- data.table(Date = as.Date(1:10),
Sales = c(rep(0, 5), rep(1, 5)))
anySalesLastWeek <- function(date) {
return(!empty(dt[as.integer(Date - as.Date(date)) %in% -7:-1 & Sales > 0, ]))
}
dt[anySalesLastWeek(Date), ]
I expected R to call the function anySalesLastWeek() once per row with the column value of Date in that row. Instead, the function seems to be called only once with dt$Date as the input.
The expected output would be the first six rows.
Upvotes: 1
Views: 107
Reputation: 25225
Not sure if I understood it correctly. Here is an approach using non-equi join:
DT[, oneweekago := Date - 7]
DT[, anySalesLastWeek :=
DT[DT, on=.(Date>=oneweekago, Date<Date), allow.cartesian=TRUE,
sum(Sales, na.rm=TRUE) > 0, by=.EACHI]$V1
]
output:
Date Sales oneweekago anySalesLastWeek
1: 2019-10-04 0 2019-09-27 FALSE
2: 2019-10-05 0 2019-09-28 FALSE
3: 2019-10-06 0 2019-09-29 FALSE
4: 2019-10-07 0 2019-09-30 FALSE
5: 2019-10-08 0 2019-10-01 FALSE
6: 2019-10-09 1 2019-10-02 FALSE
7: 2019-10-10 1 2019-10-03 TRUE
8: 2019-10-11 1 2019-10-04 TRUE
9: 2019-10-12 1 2019-10-05 TRUE
10: 2019-10-13 1 2019-10-06 TRUE
data:
library("data.table")
DT <- data.table(Date=seq(Sys.Date(), by="1 day", length.out=10), Sales=c(rep(0, 5), rep(1, 5)))
# Date Sales
# 1: 2019-10-04 0
# 2: 2019-10-05 0
# 3: 2019-10-06 0
# 4: 2019-10-07 0
# 5: 2019-10-08 0
# 6: 2019-10-09 1
# 7: 2019-10-10 1
# 8: 2019-10-11 1
# 9: 2019-10-12 1
#10: 2019-10-13 1
edit: regarding OP's function and also from Rohit comment, you can fix it as follows:
anySalesLastWeek <- function(date) {
DT[as.integer(Date - as.Date(date)) %in% -7:-1 & Sales > 0, .N>0]
}
DT[sapply(Date, anySalesLastWeek)]
Upvotes: 2