Jack
Jack

Reputation: 197

Operating on data table based off of multiple row and column conditions

Say I have the following data table:

prc = c(2,5,4,6,7,3)
ret = c(0.01,0.03,-0.02,0.01,-0.001,0.04)
cap = c(100,200,300,80,40,900)
comp = c("a","a","a","b","b","c")
dt = data.table(comp, prc, ret, cap)

   comp prc    ret cap
1:    a   2  0.010 100
2:    a   5  0.030 200
3:    a   4 -0.020 300
4:    b   6  0.010  80
5:    b   7 -0.001  40
6:    c   3  0.040 900

I want to operate based off of the following conditions: For each company, at row t, there must be a prc at row (t - 2), ret at row (t - 1), and cap at (t-2). Therefore, the only row in which I would be operating on is 3.

I have tried this:

> dt[, which(is.na(shift(prc,2)) | is.na(shift(ret,1)) | is.na(shift(v,2))), by = comp]
   comp V1
1:    a  1
2:    a  2
3:    b  1
4:    b  2
5:    c  1

As anticipated, the only row not included is the row in which comp = a and prc = 4. Now what could I do to to make a mathematical operation on that row. Note that I do not want to delete the rest of the rows because they will be used in the calculation for the desired row.

Upvotes: 1

Views: 109

Answers (2)

Uwe
Uwe

Reputation: 42582

The OP already was on the right track but stopped halfways.

dt[, which(is.na(shift(prc, 2)) | is.na(shift(ret, 1)) | is.na(shift(cap, 2))), by = comp]
#   comp V1
#1:    a  1
#2:    a  2
#3:    b  1
#4:    b  2
#5:    c  1

selects all rows within each group which do not fulfill the requirements. (Note that in the original code the OP referenced v which isn't present in the data set).

By negating,

dt[, which(!(is.na(shift(prc, 2)) | is.na(shift(ret, 1)) | is.na(shift(cap, 2)))), by = comp]
#   comp V1
#1:    a  3

selects all rows within each group which fulfill the requirements.

dt[, .SD[!is.na(shift(prc, 2)) & !is.na(shift(ret, 1)) & !is.na(shift(cap, 2))], by = comp]
#   comp prc   ret cap
#1:    a   4 -0.02 300

returns all rows of the original data.table which fulfill the condition including all columns.

Note that (a) Boolean algebra has been used to replace the condition !(A | B | C) by !A & !B & !C and (b) the call to which() has been made redundant.


The solution of akrun can be streamlined in a similar way. Instead of

dt[dt[, .I[seq_len(.N)>=3], comp]$V1]

we can write

dt[, .SD[seq_len(.N) >= 3L], comp]

Note that both approaches are not fully equivalent. While the latter approach simply shows all rows for each comp which have at least two preceeding rows, the first approach additionally checks also the contents for NA which I believe is more reliable.

Upvotes: 0

akrun
akrun

Reputation: 887851

May be this helps

dt[dt[, .I[seq_len(.N)>=3], comp]$V1]
#   comp prc   ret cap
#1:    a   4 -0.02 300

If the intention is not to subset, but to have to new column, say 'Flag' that show TRUE/FALSE values

dt[,  Flag := seq_len(.N)>=3, comp]

Upvotes: 1

Related Questions