Reputation: 61
The solution to this simple problem has eluded me for several hours. I have a data table in which a value is identified by several classification variables (A, B, L). Where there are observations characterized by duplicate classification variables A & B, I want to retain the one that has the highest 'L'. So, if I have a table generated with this code
set.seed(17)
DT <- data.table(A=rep(c("a","b"),each=5),
B=c("a","b","c","d","d","a","b","b","c","d"),
L=c(1,1,1,2,1,1,1,2,1,1),
val=rnbinom(10, size=2, mu=3))
Making the following:
A B L val
1: a a 1 1
2: a b 1 10
3: a c 1 3
4: a d 1 5
5: a d 2 2
6: b a 1 8
7: b b 1 7
8: b b 2 1
9: b c 1 2
10: b d 1 2
I have tried commands such as
setkey(DT,A,B,L)
DT[ , .(A,B,L,val) , mult="last"]
but I'm just not getting something. I want a resulting table that looks like this
A B L val
1: a a 1 1
2: a b 1 10
3: a c 1 3
5: a d 2 2
6: b a 1 8
8: b b 2 1
9: b c 1 2
10: b d 1 2
Upvotes: 1
Views: 70
Reputation: 8516
DT[, lapply(.SD, last), .(A,B)])
should also work and seems to be a bit faster than the merge solution
Upvotes: 1
Reputation: 2777
Here's how I'd do it (without mult
)
DT[order(-L), .SD[1], .(A,B)]
With mult
something like this would do it - note that Im doing an actual join here
DT[order(L)][unique(DT[, .(A, B)]), on = c('A', 'B'), mult = 'last']
#> A B L val
#> 1: a a 1 1
#> 2: a b 1 1
#> 3: a c 1 3
#> 4: a d 2 12
#> 5: b a 1 6
#> 6: b b 2 2
#> 7: b c 1 3
#> 8: b d 1 5
Upvotes: 0
Reputation: 374
set.seed(17)
library(data.table)
DT <- data.table(A=rep(c("a","b"),each=5),
B=c("a","b","c","d","d","a","b","b","c","d"),
L=c(1,1,1,2,1,1,1,2,1,1),
val=rnbinom(10, size=2, mu=3))
result <- DT[DT[, .I[L == max(L)], by = list(A, B)]$V1]
> result
A B L val
1: a a 1 1
2: a b 1 1
3: a c 1 3
4: a d 2 12
5: b a 1 6
6: b b 2 2
7: b c 1 3
8: b d 1 5
Upvotes: 0
Reputation: 8880
solution option
library(data.table)
dt <- structure(list(A = c("a", "a", "a", "a", "a", "b", "b", "b",
"b", "b"), B = c("a", "b", "c", "d", "d", "a", "b", "b", "c",
"d"), L = c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L), val = c(1L,
10L, 3L, 5L, 2L, 8L, 7L, 1L, 2L, 2L)), class = "data.frame", row.names = c(NA,
-10L))
setDT(dt)
merge(dt[, list(L = last(L)), by =list(A, B)], dt)
#> A B L val
#> 1: a a 1 1
#> 2: a b 1 10
#> 3: a c 1 3
#> 4: a d 2 2
#> 5: b a 1 8
#> 6: b b 2 1
#> 7: b c 1 2
#> 8: b d 1 2
Created on 2021-03-24 by the reprex package (v1.0.0)
Upvotes: 0