Samantha
Samantha

Reputation: 41

Delete the last entry of groups in a data frame

I was hoping to clean my data by deleting the last entry of all groups with the same elements.

my data looks somewhat like this:

  type  2   3  
 1 A    2.3 4  
 2 A    3.4 5  
 3 B    5.5 6  
 4 B    6   7 
 5 B    3   7 
 6 C    5   6  
 ....

ie. I am trying to get rid of the last entry of every group with the same type, so it will look like this.

  type  2   3  
 1 A    2.3 4  
 2 B    5.5 6 
 3 B    6   7 
 4 C    5   6

My actual data have different length for each type, and usually over a few hundreds. I thought of group_by and then last() but it seems to work only with summarize. any idea?

Upvotes: 3

Views: 1610

Answers (2)

akrun
akrun

Reputation: 887048

Here is another option with dplyr. After grouping by 'type', we check the sequence of row (row_number()) is not equal to the number of rows (n()- corresponds to the last row number as well) or |) if the number of rows is equal to 1 (n()==1). So, basically, we are removing the last row by creating the logical index (row_number() !=n()) along with an exception to handle the cases where there is only a single row (n()==1).

library(dplyr)
df1 %>% 
    group_by(type) %>%
    filter(row_number()!=n()|n()==1)
#  type   `2`   `3`
#  <chr> <dbl> <int>
#1     A   2.3     4
#2     B   5.5     6
#3     B   6.0     7
#4     C   5.0     6

Upvotes: 3

Zheyuan Li
Zheyuan Li

Reputation: 73265

Let dat be your data frame, you may use

dat[duplicated(dat$type, fromLast = TRUE), ]

where duplicated(, fromLast = TRUE) will find duplicates backward.


Example

set.seed(0)
dat <- data.frame(type = sort(sample(LETTERS[1:4], 12, TRUE)), x = 1:12)

#   type  x
#1     A  1
#2     A  2
#3     A  3
#4     B  4
#5     B  5
#6     C  6
#7     C  7
#8     C  8
#9     D  9
#10    D 10
#11    D 11
#12    D 12

dat[duplicated(dat$type, fromLast = TRUE), ]

#   type  x
#1     A  1
#2     A  2
#4     B  4
#6     C  6
#7     C  7
#9     D  9
#10    D 10
#11    D 11

Upvotes: 5

Related Questions