Mark
Mark

Reputation: 173

Row conditional column operations in data.table

I have a large data.table where I for each row need to make computations based on part of the full data.table. As an example consider the following data.table, and assume I for each row want to compute the sum of the num variable for every rows where id2 matches id1 for the current row as well as the time variable is within distance 1 from the time of the current row.

set.seed(123)

dat <- data.table(cbind(id1=sample(1:5,10,replace=T),
                        id2=sample(1:5,10,replace=T),
                        num=sample(1:10,10,replace=T),
                        time=sample(1:10,10,replace=T)))

This could easily be done by looping over each row like this

dat[,val:= 0]
for (i in 1:nrow(dat)){
  this.val <- dat[ (id2==id1[i]) & (time>=time[i]-2) & (time<=time[i]+2),sum(num)]
  dat[i,val:=this.val]
}

dat

The resulting data.table looks like this:

   > dat
        id1 id2 num time val
     1:   2   5   9   10   6
     2:   4   3   7   10   0
     3:   3   4   7    7  10
     4:   5   3  10    8   9
     5:   5   1   7    1   2
     6:   1   5   8    5   6
     7:   3   2   6    8  17
     8:   5   1   6    3  10
     9:   3   2   3    4   0
    10:   3   5   2    3   0

What is the proper/fast way to do things like this using data.table?

Upvotes: 12

Views: 316

Answers (1)

akrun
akrun

Reputation: 886938

We can use a self-join here by creating the 'timeminus2' and 'timeplus2' column, join on by 'id2' with 'id1' and the non-equi logical condition to get the sum of 'num' and assign (:=) the 'val' column to the original dataset

tmp <- dat[.(id1 = id1, timeminus2 = time - 2, timeplus2 = time + 2), 
             .(val = sum(num)),
             on = .(id2 = id1, time >= timeminus2, time <= timeplus2),
             by = .EACHI
         ][is.na(val), val := 0][]
dat[, val := tmp$val][]
#     id1 id2 num time val
# 1:   2   5   9   10   6
# 2:   4   3   7   10   0
# 3:   3   4   7    7  10
# 4:   5   3  10    8   9
# 5:   5   1   7    1   2
# 6:   1   5   8    5   6
# 7:   3   2   6    8  17
# 8:   5   1   6    3  10
# 9:   3   2   3    4   0
#10:   3   5   2    3   0

Upvotes: 5

Related Questions