Reputation: 77
I'm attempting to create a function that takes a datatable & a variable as arguments. The data table will always have 4 columns (1st is a date column, the other 3 are numeric) but the number of rows will differ. The variable is an integer that is set to act as cutoff. The goal of the function is to output the datatable with all values in the numeric columns greater than the first number larger than the variable. Here is a snippet of the datatable being tested
#datatable dt
> dput(dt[30:40])
structure(list(a = structure(c(18517, 18524, 18531, 18538, 18545,
18552, 18559, 18566, 18573, 18580, 18587), class = "Date"), b = c(14L,
16L, 18L, 21L, 23L, 26L, 29L, 32L, 35L, 39L, 42L), c = c(9L,
10L, 12L, 14L, 16L, 18L, 21L, 23L, 26L, 29L, 32L), d = c(4L,
5L, 6L, 8L, 9L, 11L, 13L, 16L, 18L, 20L, 23L)), row.names = c(NA,
-11L), class = c("data.table", "data.frame"))
> dt[30:40]
a b c d
1: 2020-09-12 14 9 4
2: 2020-09-19 16 10 5
3: 2020-09-26 18 12 6
4: 2020-10-03 21 14 8
5: 2020-10-10 23 16 9
6: 2020-10-17 26 18 11
7: 2020-10-24 29 21 13
8: 2020-10-31 32 23 16
9: 2020-11-07 35 26 18
10: 2020-11-14 39 29 20
11: 2020-11-21 42 32 23
Here is the function I've come up with:
cutoff <- 21 #some integer
checkDT <- function(dt, cutoff){
columns <- c('b','c','d')
for (i in columns){
for (j in dt[,..columns]){
if(is.infinite(min(j[which(j > cutoff)]))){
dt <- dt
}else{
dt[i > min(j[which(j > cutoff)]), `:=` (i = NA)]
}
}
return(dt)
}
}
This outputs a datatable with a fifth column i that is all NA. If I use this statement for a specific column than the output is as expected but I'm trying to have the function perform this to get rid of some lines of code.
if(is.infinite(min(dt$b[which(dt$b > cutoff)]))){
dt <- dt
} else{
dt[b > min(dt$b[which(dt$b > cutoff)]), `:=`(b = NA)]
}
> dt[30:40]
a b c d
1: 2020-09-12 14 9 4
2: 2020-09-19 16 10 5
3: 2020-09-26 18 12 6
4: 2020-10-03 21 14 8
5: 2020-10-10 23 16 9
6: 2020-10-17 NA 18 11
7: 2020-10-24 NA 21 13
8: 2020-10-31 NA 23 16
9: 2020-11-07 NA 26 18
10: 2020-11-14 NA 29 20
11: 2020-11-21 NA 32 23
This is the expected output with a cutoff value of 21:
a b c d
1: 2020-09-12 14 9 4
2: 2020-09-19 16 10 5
3: 2020-09-26 18 12 6
4: 2020-10-03 21 14 8
5: 2020-10-10 23 16 9
6: 2020-10-17 NA 18 11
7: 2020-10-24 NA 21 13
8: 2020-10-31 NA 23 16
9: 2020-11-07 NA NA 18
10: 2020-11-14 NA NA 20
11: 2020-11-21 NA NA 23
Upvotes: 0
Views: 71
Reputation: 389047
Here's another way using lapply
and .SDcols
.
checkDT <- function(dt1, cutoff) {
columns <- c('b','c','d')
dt1[, (columns) := lapply(.SD, function(x)
replace(x, x > x[x > cutoff][1], NA)), .SDcols = columns][]
}
checkDT(dt, 21)
# a b c d
# 1: 2020-09-12 14 9 4
# 2: 2020-09-19 16 10 5
# 3: 2020-09-26 18 12 6
# 4: 2020-10-03 21 14 8
# 5: 2020-10-10 23 16 9
# 6: 2020-10-17 NA 18 11
# 7: 2020-10-24 NA 21 13
# 8: 2020-10-31 NA 23 16
# 9: 2020-11-07 NA NA 18
#10: 2020-11-14 NA NA 20
#11: 2020-11-21 NA NA 23
Upvotes: 2
Reputation: 2414
I simplified a lot of your notation here
In data.table you don't have to use dt$ again inside the brackets
The which() isn't necessary because the logical vector can be used directly to indicate which rows to keep.
The key is using the get
function to translate the text to a column name
I just used suppressWarnings to get rid of the infinity warnings,
The code doesn't replace anything in that case and that's what you want.
checkDT <- function(dt, cutoff) {
columns <- c('b', 'c', 'd')
for (i in columns) {
suppressWarnings(dt[get(i) > min(dt[get(i) > cutoff, get(i)]), (i) := NA])
}
dt[]
}
checkDT(dt, cutoff)
gives the desired result
Upvotes: 1