Reputation: 339
I have a data.table
set.seed(1)
DT <- data.table(tag = rep(LETTERS[1:4],each = 2, times = 3),
year = rep(1:3, each = 8), month = rep(c(1:2), 12),
value = runif(24, 1, 10))
DT
tag year month value
1: A 1 1 3.389578
2: A 1 2 4.349115
3: B 1 1 6.155680
4: B 1 2 9.173870
5: C 1 1 2.815137
6: C 1 2 9.085507
7: D 1 1 9.502077
8: D 1 2 6.947180
9: A 2 1 6.662026
10: A 2 2 1.556076
11: B 2 1 2.853771
12: B 2 2 2.589011
13: C 2 1 7.183206
14: C 2 2 4.456933
15: D 2 1 7.928573
16: D 2 2 5.479293
17: A 3 1 7.458567
18: A 3 2 9.927155
19: B 3 1 4.420317
20: B 3 2 7.997007
21: C 3 1 9.412347
22: C 3 2 2.909283
23: D 3 1 6.865064
24: D 3 2 2.129996
Sort this DT
by year
, month
, and -value
:
setorder(DT, year, month, -value)
will produce:
tag year month value
1: D 1 1 9.502077
2: B 1 1 6.155680
3: A 1 1 3.389578
4: C 1 1 2.815137
5: B 1 2 9.173870
6: C 1 2 9.085507
7: D 1 2 6.947180
8: A 1 2 4.349115
9: D 2 1 7.928573
10: C 2 1 7.183206
11: A 2 1 6.662026
12: B 2 1 2.853771
13: D 2 2 5.479293
14: C 2 2 4.456933
15: B 2 2 2.589011
16: A 2 2 1.556076
17: C 3 1 9.412347
18: A 3 1 7.458567
19: D 3 1 6.865064
20: B 3 1 4.420317
21: A 3 2 9.927155
22: B 3 2 7.997007
23: C 3 2 2.909283
24: D 3 2 2.129996
I would like the result to be like following:
tag year month value
1: D 1 1 9.502077
2: B 1 1 6.155680
3: B 1 2 9.173870
4: D 1 2 6.947180
5: D 2 1 7.928573
6: C 2 1 7.183206
7: D 2 2 5.479293
8: C 2 2 4.456933
9: C 3 1 9.412347
10: A 3 1 7.458567
11: A 3 2 9.927155
12: C 3 2 2.909283
The result DT
would have the following property: within each year keep only the two tags
that have larger value in month 1
. For example, year 1
, the two tags
that have larger value are D
and B
, so keep D
and B
for the whole year 1
. year 2
keeps D
and C
. At each month 1
, I need to reselect the rows with the year.
Upvotes: 1
Views: 507
Reputation: 887118
We group by 'year', get the first two elements in 'tag' for 'month' 1 (as it is already ordered), create a logical index using %in%
and subset the rows.
DT[, .SD[tag %in% head(tag[month ==1],2)], .(year)]
# year tag month value
# 1: 1 D 1 9.502077
# 2: 1 B 1 6.155680
# 3: 1 B 2 9.173870
# 4: 1 D 2 6.947180
# 5: 2 D 1 7.928573
# 6: 2 C 1 7.183206
# 7: 2 D 2 5.479293
# 8: 2 C 2 4.456933
# 9: 3 C 1 9.412347
#10: 3 A 1 7.458567
#11: 3 A 2 9.927155
#12: 3 C 2 2.909283
Upvotes: 3