Reputation: 75
Rows are grouped by the variables grp1, grp2, grp3. For each value of varD in the table below, I need to return the minimum value of varA from the minimum of all values of VarC that are >= varD? The returned value is stored in the column varX
On a given row, the particular varC and varD are not pairs. Rather varD needs to be compared to all varC when calculating the minimum varC greater than varD.
varA, varB and varC values are paired, with varC being the cumulative sum of varB.
grp1 grp2(POSIXct) grp3 varA varB varC varD
A 02/02/2019 05:30:00 -30 -100 1661 1661 280
A 02/02/2019 05:30:00 -30 0 0 1661 560
A 02/02/2019 05:30:00 -30 57 720 2381 840
A 02/02/2019 05:30:00 -30 59 0 2381 1120
A 02/02/2019 05:30:00 -30 70 0 2381 1400
A 02/02/2019 05:30:00 -30 77 0 2381 1680
A 02/02/2019 05:30:00 -30 91 80 2461 1960
A 02/02/2019 05:30:00 -30 93 0 2461 2240
A 02/02/2019 05:30:00 -30 95 0 2461 2520
A 02/02/2019 05:30:00 -30 99 340 2801 2800
The expected results for a group should be:
grp1 grp2(POSIXct) grp3 varA varB varC varD varX
A 02/02/2019 05:30:00 -30 -100 1661 1661 280 -100
A 02/02/2019 05:30:00 -30 0 0 1661 560 -100
A 02/02/2019 05:30:00 -30 57 720 2381 840 -100
A 02/02/2019 05:30:00 -30 59 0 2381 1120 -100
A 02/02/2019 05:30:00 -30 70 0 2381 1400 -100
A 02/02/2019 05:30:00 -30 77 0 2381 1680 57
A 02/02/2019 05:30:00 -30 91 80 2461 1960 57
A 02/02/2019 05:30:00 -30 93 0 2461 2240 57
A 02/02/2019 05:30:00 -30 95 0 2461 2520 99
A 02/02/2019 05:30:00 -30 99 340 2801 2800 99
For varD:
Upvotes: 2
Views: 87
Reputation: 25225
You can find the min by groups first then use a rolling join to look the nearest min:
agg <- DT[, min(varA), by=.(grp1, grp2, grp3, varC)]
DT[, newvar :=
agg[DT, on=c("grp1", "grp2", "grp3", varC="varD"), roll=-Inf]$V1
]
output:
grp1 grp2 grp3 varA varB varC varD newvar
1: A 02/02/2019T05:30:00 -30 -100 1661 1661 280 -100
2: A 02/02/2019T05:30:00 -30 0 0 1661 560 -100
3: A 02/02/2019T05:30:00 -30 57 720 2381 840 -100
4: A 02/02/2019T05:30:00 -30 59 0 2381 1120 -100
5: A 02/02/2019T05:30:00 -30 70 0 2381 1400 -100
6: A 02/02/2019T05:30:00 -30 77 0 2381 1680 57
7: A 02/02/2019T05:30:00 -30 91 80 2461 1960 57
8: A 02/02/2019T05:30:00 -30 93 0 2461 2240 57
9: A 02/02/2019T05:30:00 -30 95 0 2461 2520 99
10: A 02/02/2019T05:30:00 -30 99 340 2801 2800 99
data:
library(data.table)
DT <- fread("grp1 grp2 grp3 varA varB varC varD
A 02/02/2019T05:30:00 -30 -100 1661 1661 280
A 02/02/2019T05:30:00 -30 0 0 1661 560
A 02/02/2019T05:30:00 -30 57 720 2381 840
A 02/02/2019T05:30:00 -30 59 0 2381 1120
A 02/02/2019T05:30:00 -30 70 0 2381 1400
A 02/02/2019T05:30:00 -30 77 0 2381 1680
A 02/02/2019T05:30:00 -30 91 80 2461 1960
A 02/02/2019T05:30:00 -30 93 0 2461 2240
A 02/02/2019T05:30:00 -30 95 0 2461 2520
A 02/02/2019T05:30:00 -30 99 340 2801 2800")
Upvotes: 2