lagn91
lagn91

Reputation: 77

Loop through datatable & alter values meeting a specific condition

I'm attempting to create a function that takes a datatable & a variable as arguments. The data table will always have 4 columns (1st is a date column, the other 3 are numeric) but the number of rows will differ. The variable is an integer that is set to act as cutoff. The goal of the function is to output the datatable with all values in the numeric columns greater than the first number larger than the variable. Here is a snippet of the datatable being tested

#datatable dt
> dput(dt[30:40])
structure(list(a = structure(c(18517, 18524, 18531, 18538, 18545, 
18552, 18559, 18566, 18573, 18580, 18587), class = "Date"), b = c(14L, 
16L, 18L, 21L, 23L, 26L, 29L, 32L, 35L, 39L, 42L), c = c(9L, 
10L, 12L, 14L, 16L, 18L, 21L, 23L, 26L, 29L, 32L), d = c(4L, 
5L, 6L, 8L, 9L, 11L, 13L, 16L, 18L, 20L, 23L)), row.names = c(NA, 
-11L), class = c("data.table", "data.frame"))
> dt[30:40]
             a  b  c  d
 1: 2020-09-12 14  9  4
 2: 2020-09-19 16 10  5
 3: 2020-09-26 18 12  6
 4: 2020-10-03 21 14  8
 5: 2020-10-10 23 16  9
 6: 2020-10-17 26 18 11
 7: 2020-10-24 29 21 13
 8: 2020-10-31 32 23 16
 9: 2020-11-07 35 26 18
10: 2020-11-14 39 29 20
11: 2020-11-21 42 32 23

Here is the function I've come up with:

cutoff <-  21 #some integer
checkDT <- function(dt, cutoff){
  columns <- c('b','c','d')
  for (i in columns){
    for (j in dt[,..columns]){
      if(is.infinite(min(j[which(j > cutoff)]))){
       dt <- dt
      }else{
       dt[i > min(j[which(j > cutoff)]), `:=` (i = NA)]
      }
     }
   return(dt)
  }
}

This outputs a datatable with a fifth column i that is all NA. If I use this statement for a specific column than the output is as expected but I'm trying to have the function perform this to get rid of some lines of code.

if(is.infinite(min(dt$b[which(dt$b > cutoff)]))){
    dt <- dt
  } else{
    dt[b > min(dt$b[which(dt$b > cutoff)]), `:=`(b = NA)] 
  }
> dt[30:40]
             a  b  c  d
 1: 2020-09-12 14  9  4
 2: 2020-09-19 16 10  5
 3: 2020-09-26 18 12  6
 4: 2020-10-03 21 14  8
 5: 2020-10-10 23 16  9
 6: 2020-10-17 NA 18 11
 7: 2020-10-24 NA 21 13
 8: 2020-10-31 NA 23 16
 9: 2020-11-07 NA 26 18
10: 2020-11-14 NA 29 20
11: 2020-11-21 NA 32 23

This is the expected output with a cutoff value of 21:

             a  b  c  d
 1: 2020-09-12 14  9  4
 2: 2020-09-19 16 10  5
 3: 2020-09-26 18 12  6
 4: 2020-10-03 21 14  8
 5: 2020-10-10 23 16  9
 6: 2020-10-17 NA 18 11
 7: 2020-10-24 NA 21 13
 8: 2020-10-31 NA 23 16
 9: 2020-11-07 NA NA 18
10: 2020-11-14 NA NA 20
11: 2020-11-21 NA NA 23

Upvotes: 0

Views: 71

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389047

Here's another way using lapply and .SDcols.

checkDT <- function(dt1, cutoff) {
  columns <- c('b','c','d')
  dt1[, (columns) := lapply(.SD, function(x) 
          replace(x, x > x[x > cutoff][1], NA)), .SDcols = columns][]
}

checkDT(dt, 21)

#             a  b  c  d
# 1: 2020-09-12 14  9  4
# 2: 2020-09-19 16 10  5
# 3: 2020-09-26 18 12  6
# 4: 2020-10-03 21 14  8
# 5: 2020-10-10 23 16  9
# 6: 2020-10-17 NA 18 11
# 7: 2020-10-24 NA 21 13
# 8: 2020-10-31 NA 23 16
# 9: 2020-11-07 NA NA 18
#10: 2020-11-14 NA NA 20
#11: 2020-11-21 NA NA 23

Upvotes: 2

Brian Montgomery
Brian Montgomery

Reputation: 2414

I simplified a lot of your notation here
In data.table you don't have to use dt$ again inside the brackets
The which() isn't necessary because the logical vector can be used directly to indicate which rows to keep.
The key is using the get function to translate the text to a column name
I just used suppressWarnings to get rid of the infinity warnings,
The code doesn't replace anything in that case and that's what you want.

checkDT <- function(dt, cutoff) {
  columns <- c('b', 'c', 'd')
  for (i in columns) {
    suppressWarnings(dt[get(i) > min(dt[get(i) > cutoff, get(i)]), (i) := NA]) 
  }
  dt[]
}

checkDT(dt, cutoff) gives the desired result

Upvotes: 1

Related Questions