Michael 96
Michael 96

Reputation: 75

Return min value for subgroups of grouped data

Rows are grouped by the variables grp1, grp2, grp3. For each value of varD in the table below, I need to return the minimum value of varA from the minimum of all values of VarC that are >= varD? The returned value is stored in the column varX

On a given row, the particular varC and varD are not pairs. Rather varD needs to be compared to all varC when calculating the minimum varC greater than varD.

varA, varB and varC values are paired, with varC being the cumulative sum of varB.

grp1   grp2(POSIXct)        grp3  varA  varB  varC  varD  
A      02/02/2019 05:30:00  -30   -100  1661  1661   280
A      02/02/2019 05:30:00  -30      0     0  1661   560
A      02/02/2019 05:30:00  -30     57   720  2381   840
A      02/02/2019 05:30:00  -30     59     0  2381  1120
A      02/02/2019 05:30:00  -30     70     0  2381  1400
A      02/02/2019 05:30:00  -30     77     0  2381  1680
A      02/02/2019 05:30:00  -30     91    80  2461  1960
A      02/02/2019 05:30:00  -30     93     0  2461  2240
A      02/02/2019 05:30:00  -30     95     0  2461  2520
A      02/02/2019 05:30:00  -30     99   340  2801  2800

The expected results for a group should be:

grp1   grp2(POSIXct)        grp3  varA  varB  varC  varD  varX
A      02/02/2019 05:30:00  -30   -100  1661  1661   280  -100
A      02/02/2019 05:30:00  -30      0     0  1661   560  -100
A      02/02/2019 05:30:00  -30     57   720  2381   840  -100  
A      02/02/2019 05:30:00  -30     59     0  2381  1120  -100
A      02/02/2019 05:30:00  -30     70     0  2381  1400  -100
A      02/02/2019 05:30:00  -30     77     0  2381  1680    57
A      02/02/2019 05:30:00  -30     91    80  2461  1960    57
A      02/02/2019 05:30:00  -30     93     0  2461  2240    57
A      02/02/2019 05:30:00  -30     95     0  2461  2520    99
A      02/02/2019 05:30:00  -30     99   340  2801  2800    99

For varD:

Upvotes: 2

Views: 87

Answers (1)

chinsoon12
chinsoon12

Reputation: 25225

You can find the min by groups first then use a rolling join to look the nearest min:

agg <- DT[, min(varA), by=.(grp1, grp2, grp3, varC)]

DT[, newvar :=
    agg[DT, on=c("grp1", "grp2", "grp3", varC="varD"), roll=-Inf]$V1
]

output:

    grp1                grp2 grp3 varA varB varC varD newvar
 1:    A 02/02/2019T05:30:00  -30 -100 1661 1661  280   -100
 2:    A 02/02/2019T05:30:00  -30    0    0 1661  560   -100
 3:    A 02/02/2019T05:30:00  -30   57  720 2381  840   -100
 4:    A 02/02/2019T05:30:00  -30   59    0 2381 1120   -100
 5:    A 02/02/2019T05:30:00  -30   70    0 2381 1400   -100
 6:    A 02/02/2019T05:30:00  -30   77    0 2381 1680     57
 7:    A 02/02/2019T05:30:00  -30   91   80 2461 1960     57
 8:    A 02/02/2019T05:30:00  -30   93    0 2461 2240     57
 9:    A 02/02/2019T05:30:00  -30   95    0 2461 2520     99
10:    A 02/02/2019T05:30:00  -30   99  340 2801 2800     99

data:

library(data.table)
DT <- fread("grp1   grp2        grp3  varA  varB  varC  varD  
A      02/02/2019T05:30:00  -30   -100  1661  1661   280
A      02/02/2019T05:30:00  -30      0     0  1661   560
A      02/02/2019T05:30:00  -30     57   720  2381   840
A      02/02/2019T05:30:00  -30     59     0  2381  1120
A      02/02/2019T05:30:00  -30     70     0  2381  1400
A      02/02/2019T05:30:00  -30     77     0  2381  1680
A      02/02/2019T05:30:00  -30     91    80  2461  1960
A      02/02/2019T05:30:00  -30     93     0  2461  2240
A      02/02/2019T05:30:00  -30     95     0  2461  2520
A      02/02/2019T05:30:00  -30     99   340  2801  2800")

Upvotes: 2

Related Questions