Reputation: 197
Say I have the following data table:
prc = c(2,5,4,6,7,3)
ret = c(0.01,0.03,-0.02,0.01,-0.001,0.04)
cap = c(100,200,300,80,40,900)
comp = c("a","a","a","b","b","c")
dt = data.table(comp, prc, ret, cap)
comp prc ret cap
1: a 2 0.010 100
2: a 5 0.030 200
3: a 4 -0.020 300
4: b 6 0.010 80
5: b 7 -0.001 40
6: c 3 0.040 900
I want to operate based off of the following conditions: For each company, at row t, there must be a prc at row (t - 2), ret at row (t - 1), and cap at (t-2). Therefore, the only row in which I would be operating on is 3.
I have tried this:
> dt[, which(is.na(shift(prc,2)) | is.na(shift(ret,1)) | is.na(shift(v,2))), by = comp]
comp V1
1: a 1
2: a 2
3: b 1
4: b 2
5: c 1
As anticipated, the only row not included is the row in which comp = a and prc = 4. Now what could I do to to make a mathematical operation on that row. Note that I do not want to delete the rest of the rows because they will be used in the calculation for the desired row.
Upvotes: 1
Views: 109
Reputation: 42582
The OP already was on the right track but stopped halfways.
dt[, which(is.na(shift(prc, 2)) | is.na(shift(ret, 1)) | is.na(shift(cap, 2))), by = comp]
# comp V1
#1: a 1
#2: a 2
#3: b 1
#4: b 2
#5: c 1
selects all rows within each group which do not fulfill the requirements. (Note that in the original code the OP referenced v
which isn't present in the data set).
By negating,
dt[, which(!(is.na(shift(prc, 2)) | is.na(shift(ret, 1)) | is.na(shift(cap, 2)))), by = comp]
# comp V1
#1: a 3
selects all rows within each group which fulfill the requirements.
dt[, .SD[!is.na(shift(prc, 2)) & !is.na(shift(ret, 1)) & !is.na(shift(cap, 2))], by = comp]
# comp prc ret cap
#1: a 4 -0.02 300
returns all rows of the original data.table which fulfill the condition including all columns.
Note that (a) Boolean algebra has been used to replace the condition !(A | B | C)
by !A & !B & !C
and (b) the call to which()
has been made redundant.
The solution of akrun can be streamlined in a similar way. Instead of
dt[dt[, .I[seq_len(.N)>=3], comp]$V1]
we can write
dt[, .SD[seq_len(.N) >= 3L], comp]
Note that both approaches are not fully equivalent. While the latter approach simply shows all rows for each comp
which have at least two preceeding rows, the first approach additionally checks also the contents for NA
which I believe is more reliable.
Upvotes: 0
Reputation: 887851
May be this helps
dt[dt[, .I[seq_len(.N)>=3], comp]$V1]
# comp prc ret cap
#1: a 4 -0.02 300
If the intention is not to subset, but to have to new column, say 'Flag' that show TRUE/FALSE values
dt[, Flag := seq_len(.N)>=3, comp]
Upvotes: 1