user3664020
user3664020

Reputation: 3020

remove rows at certain intervals in the data frame

I have a data frame like following

 structure(list(c1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 3, 2, 1, 3, 
2, 1, 3, 2, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1), c2 = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b", 
"c"), class = "factor")), .Names = c("c1", "c2"), row.names = c(NA, 
-27L), class = "data.frame")


   c1 c2
1   1  a
2   2  a
3   3  a
4   1  a
5   2  a
6   3  a
7   1  a
8   2  a
9   3  a
10  3  b
11  2  b
12  1  b
13  3  b
14  2  b
15  1  b
16  3  b
17  2  b
18  1  b
19  2  c
20  3  c
21  1  c
22  2  c
23  3  c
24  1  c
25  2  c
26  3  c
27  1  c

In the above data frame there are 3 groups of (1,2,3) for a, 3 groups of (3,2,1) for b and 3 groups of (2,3,1) for c. What I want to do is to keep say 2 groups only for each of a, b and c. Is there any one line solution for this?

The output will look like following

    c1 c2
1   1  a
2   2  a
3   3  a
4   1  a
5   2  a
6   3  a
7   3  b
8   2  b
9   1  b
10  3  b
11  2  b
12  1  b
13  2  c
14  3  c
15  1  c
16  2  c
17  3  c
18  1  c

NOTE: The initial number of groups for each category of c2 can be anything (which is 3 here) and can't be known in advance , so the solution has to be independent of this initial number of groups.

Upvotes: 1

Views: 428

Answers (1)

Rich Scriven
Rich Scriven

Reputation: 99331

Here's an option that uses data.table. Assume df to be your original data.

library(data.table)
setDT(df)
df[sort(df[, .I[1:2], by = .(c1, c2)]$V1)]

The number of groups we want to keep is given by 1:2 (the first two). So if you wanted more or less, you would change the 2 to however many groups you want to keep. The above code gives

    c1 c2
 1:  1  a
 2:  2  a
 3:  3  a
 4:  1  a
 5:  2  a
 6:  3  a
 7:  3  b
 8:  2  b
 9:  1  b
10:  3  b
11:  2  b
12:  1  b
13:  2  c
14:  3  c
15:  1  c
16:  2  c
17:  3  c
18:  1  c

Upvotes: 2

Related Questions