Reputation: 3020
I have a data frame like following
structure(list(c1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 3, 2, 1, 3,
2, 1, 3, 2, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1), c2 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b",
"c"), class = "factor")), .Names = c("c1", "c2"), row.names = c(NA,
-27L), class = "data.frame")
c1 c2
1 1 a
2 2 a
3 3 a
4 1 a
5 2 a
6 3 a
7 1 a
8 2 a
9 3 a
10 3 b
11 2 b
12 1 b
13 3 b
14 2 b
15 1 b
16 3 b
17 2 b
18 1 b
19 2 c
20 3 c
21 1 c
22 2 c
23 3 c
24 1 c
25 2 c
26 3 c
27 1 c
In the above data frame there are 3 groups of (1,2,3) for a
, 3 groups of (3,2,1) for b
and 3 groups of (2,3,1) for c
. What I want to do is to keep say 2 groups only for each of a
, b
and c
. Is there any one line solution for this?
The output will look like following
c1 c2
1 1 a
2 2 a
3 3 a
4 1 a
5 2 a
6 3 a
7 3 b
8 2 b
9 1 b
10 3 b
11 2 b
12 1 b
13 2 c
14 3 c
15 1 c
16 2 c
17 3 c
18 1 c
NOTE: The initial number of groups for each category of c2
can be anything (which is 3 here) and can't be known in advance , so the solution has to be independent of this initial number of groups.
Upvotes: 1
Views: 428
Reputation: 99331
Here's an option that uses data.table. Assume df
to be your original data.
library(data.table)
setDT(df)
df[sort(df[, .I[1:2], by = .(c1, c2)]$V1)]
The number of groups we want to keep is given by 1:2
(the first two). So if you wanted more or less, you would change the 2 to however many groups you want to keep. The above code gives
c1 c2
1: 1 a
2: 2 a
3: 3 a
4: 1 a
5: 2 a
6: 3 a
7: 3 b
8: 2 b
9: 1 b
10: 3 b
11: 2 b
12: 1 b
13: 2 c
14: 3 c
15: 1 c
16: 2 c
17: 3 c
18: 1 c
Upvotes: 2