flozygy
flozygy

Reputation: 83

Delete row if the previous row has same value/string (for each group)

For each group i want to delete the row if value matches the previous row

x <- c(1,1,1,1,2,2,2,2)
y <- c("A","B","B","A","A","A","B","B")
xy <- data.frame(x,y)
colnames(xy)<-c("group","value")
xy

It should result in

x <- c(1,1,1,2,2)
y <- c("A","B","A","A","B")
result_df <- data.frame(x,y)
colnames(result_df)<-c("group","value")
result_df

Think I have to apply something with lag, but i dont get it.

Upvotes: 4

Views: 861

Answers (4)

milan
milan

Reputation: 4970

A simple base R solution.

xy[c(0, diff(duplicated(xy)))<1,]

  group value
1     1     A
2     1     B
4     1     A
5     2     A
7     2     B

Upvotes: 2

Frank
Frank

Reputation: 66819

For each group i want to delete the row if value matches the previous row

You can deduplicate on runs with rleidv from the data.table package:

xy[!duplicated(data.table::rleidv(xy)), ]

  group value
1     1     A
2     1     B
4     1     A
5     2     A
7     2     B

If there are other columns in xy, you'd do rleidv(xy, c("group", "value")) to dedupe only on those.

Upvotes: 1

jasbner
jasbner

Reputation: 2283

You are correct that lag is an appropriate way to do this comparison. First you group_by your group value so it filters within each category, then filter out those where the value is equal to lag(value) aka the previous value. The is.na statement compensates for the first lag value being NA in each group.

library(dplyr)
xy %>% group_by(group) %>%  filter(value!=lag(value) | is.na(lag(value)))
# A tibble: 5 x 2
# Groups:   group [2]
#   group value
#   <dbl> <fct>
# 1  1.00 A    
# 2  1.00 B    
# 3  1.00 A    
# 4  2.00 A    
# 5  2.00 B 

Upvotes: 3

s_baldur
s_baldur

Reputation: 33488

n <- nrow(xy)
xy[!c(FALSE, rowMeans(xy[-1, ] == xy[-n, ]) == 1), ]
  group value
1     1     A
2     1     B
4     1     A
5     2     A
7     2     B

Upvotes: 2

Related Questions