user1165199
user1165199

Reputation: 6649

Remove row if it is same as previous row, with exception of one column

I have the following dataframe

x <- data.frame(id = c(1:6), 
                a = c('a', 'b', 'b', 'a', 'a', 'c'), 
                b = rep(2, 6), 
                c = c(5, 4, 4, 5, 5, 2))

> x
  id a b c
1  1 a 2 5
2  2 b 2 4
3  3 b 2 4
4  4 a 2 5
5  5 a 2 5
6  6 c 2 2

I want to end up with

  id a b c
1  1 a 2 5
2  2 b 2 4
4  4 a 2 5
6  6 c 2 2

Requirement is that I want to remove the row if it is the same as the previous row, with the exception of the column id. If it is the same as a column further up the column but not immediately previous I do not want to get rid of it. For example id4 is the same as id1 but not removed, as it is not immediately above it.

Any help would be appreciated

Upvotes: 3

Views: 915

Answers (2)

akrun
akrun

Reputation: 887721

We can use base R

x[!c(FALSE, !rowSums(x[-1, -1] != x[-nrow(x), -1])),]
#  id a b c
#1  1 a 2 5
#2  2 b 2 4
#4  4 a 2 5
#6  6 c 2 2

Upvotes: 3

JasonWang
JasonWang

Reputation: 2434

Here is a way using lag function in dplyr. The idea is creating a key column and check whether it's the same as previous one.

library(dplyr)
x %>% 
  mutate(key=paste(a, b, c, sep="|")) %>%
  filter(key != lag(key, default="0")) %>% 
  select(-key)

Upvotes: 2

Related Questions