Reputation: 41
I was hoping to clean my data by deleting the last entry of all groups with the same elements.
my data looks somewhat like this:
type 2 3
1 A 2.3 4
2 A 3.4 5
3 B 5.5 6
4 B 6 7
5 B 3 7
6 C 5 6
....
ie. I am trying to get rid of the last entry of every group with the same type, so it will look like this.
type 2 3
1 A 2.3 4
2 B 5.5 6
3 B 6 7
4 C 5 6
My actual data have different length for each type, and usually over a few hundreds. I thought of group_by and then last()
but it seems to work only with summarize
. any idea?
Upvotes: 3
Views: 1610
Reputation: 887048
Here is another option with dplyr
. After grouping by 'type', we check the sequence of row (row_number()
) is not equal to the number of rows (n()
- corresponds to the last row number as well) or |
) if the number of rows is equal to 1 (n()==1
). So, basically, we are removing the last row by creating the logical index (row_number() !=n()
) along with an exception to handle the cases where there is only a single row (n()==1
).
library(dplyr)
df1 %>%
group_by(type) %>%
filter(row_number()!=n()|n()==1)
# type `2` `3`
# <chr> <dbl> <int>
#1 A 2.3 4
#2 B 5.5 6
#3 B 6.0 7
#4 C 5.0 6
Upvotes: 3
Reputation: 73265
Let dat
be your data frame, you may use
dat[duplicated(dat$type, fromLast = TRUE), ]
where duplicated(, fromLast = TRUE)
will find duplicates backward.
Example
set.seed(0)
dat <- data.frame(type = sort(sample(LETTERS[1:4], 12, TRUE)), x = 1:12)
# type x
#1 A 1
#2 A 2
#3 A 3
#4 B 4
#5 B 5
#6 C 6
#7 C 7
#8 C 8
#9 D 9
#10 D 10
#11 D 11
#12 D 12
dat[duplicated(dat$type, fromLast = TRUE), ]
# type x
#1 A 1
#2 A 2
#4 B 4
#6 C 6
#7 C 7
#9 D 9
#10 D 10
#11 D 11
Upvotes: 5