Reputation: 137
suppose I have a dataset like this
df <- data.frame(group = c(rep(1,3),rep(2,2), rep(3,2),rep(4,3),rep(5, 2)), score = c(30, 10, 22, 44, 6, 5, 20, 35, 2, 60, 14,5))
group score
1 1 30
2 1 10
3 1 22
4 2 44
5 2 6
6 3 5
7 3 20
8 4 35
9 4 2
10 4 60
11 5 14
12 5 5
I want to remove the first row for each group, the expected out put should look like this:
group score
1 1 10
2 1 22
3 2 6
4 3 20
5 4 2
6 4 60
7 5 5
Is there a simple way to do this?
Upvotes: 6
Views: 7613
Reputation: 206232
Now that dplyr
has slice_tail
and supports the by=
paramter, you can do
df %>% dplyr::slice_tail(n=-1, by=group)
Upvotes: 1
Reputation: 388982
An option with dplyr
is to select rows ignoring 1st row
library(dplyr)
df %>%
group_by(group) %>%
slice(2:n())
# group score
# <dbl> <dbl>
#1 1.00 10.0
#2 1.00 22.0
#3 2.00 6.00
#4 3.00 20.0
#5 4.00 2.00
#6 4.00 60.0
#7 5.00 5.00
Another way is shown by @Rich Scriven in now deleted answer
df %>%
group_by(group) %>%
slice(-1)
Upvotes: 16
Reputation: 6117
dplyr::filter(df, group == lag(group))
group score
1 1 10
2 1 22
3 2 6
4 3 20
5 4 2
6 4 60
7 5 5
See lead
and lag
of package dplyr
for more information:
https://dplyr.tidyverse.org/reference/lead-lag.html
Upvotes: 1
Reputation: 887118
Another base R
option would be to check the adjacent elements
df[c(FALSE,df$group[-1]==df$group[-nrow(df)]),]
# group score
#2 1 10
#3 1 22
#5 2 6
#7 3 20
#9 4 2
#10 4 60
#12 5 5
Here I removed the first observation in 'group' (df$group[-1]
) and compared (==
) with the vector in which last observation is removed (df$group[-nrow(df)])
). As the length
of the comparison is one less than the nrow
of the dataset, we pad with FALSE
at the top and use this as logical index to subset the dataset.
Upvotes: 2
Reputation: 26446
Quite simple with duplicated
df[duplicated(df$group),]
group score 2 1 10 3 1 22 5 2 6 7 3 20 9 4 2 10 4 60 12 5 5
Upvotes: 9