Dimitra Tsiapou
Dimitra Tsiapou

Reputation: 45

How to delete only consecutive duplicate rows?

I need to delete all duplicates in my data frame ONLY when they come in consecutive rows. I tried the distinct() function, but that deletes all duplicates - so I need a different code that gives me the opportunity to customize and say delete only when the duplicates are consecutive and that only for a specific column.

Here is an example of my data:

 Subject  Trial Event_type  Code   Time 
    
23  VP02_RP 15  Picture face01_n    887969

24  VP02_RP 15  Sound   mpossound_test5 888260

25  VP02_RP 15  Picture pospic_test5    906623

26  VP02_RP 15  Nothing ev_mnegpos_adj_onset    928623

27  VP02_RP 15  Response    15  958962

28  VP02_RP 18  Picture face01_p    987666

29  VP02_RP 18  Sound   mpossound_test6 987668

30  VP02_RP 18  Picture negpic_test6    1006031

31  VP02_RP 18  Nothing ev_mposnegpos_adj_onset 1028031

32  VP02_RP 18  Response    15  1076642

33  VP02_RP 19  Response    13  1680887

As you can see in rows 32 & 33 I have two consecutive Responses and I only want to keep the first one. So I want to delete all duplicate consecutive rows on my Event_type column.

How should I go about this?

Upvotes: 2

Views: 519

Answers (3)

akrun
akrun

Reputation: 887148

An option with data.table

library(data.table)
setDT(df1)[Event_type != shift(Event_type)]

Upvotes: 1

jared_mamrot
jared_mamrot

Reputation: 26630

A potential tidyverse solution:

library(tidyverse)

df1 <- data.frame(
  stringsAsFactors = FALSE,
         row.names = c("23","24","25","26","27",
                       "28","29","30","31","32","33"),
           Subject = c("VP02_RP","VP02_RP","VP02_RP",
                       "VP02_RP","VP02_RP","VP02_RP","VP02_RP","VP02_RP",
                       "VP02_RP","VP02_RP","VP02_RP"),
             Trial = c(15L, 15L, 15L, 15L, 15L, 18L, 18L, 18L, 18L, 18L, 19L),
        Event_type = c("Picture","Sound","Picture",
                       "Nothing","Response","Picture","Sound","Picture",
                       "Nothing","Response","Response"),
              Code = c("face01_n","mpossound_test5",
                       "pospic_test5","ev_mnegpos_adj_onset","15","face01_p",
                       "mpossound_test6","negpic_test6",
                       "ev_mposnegpos_adj_onset","15","13"),
              Time = c(887969L,888260L,906623L,
                       928623L,958962L,987666L,987668L,1006031L,1028031L,
                       1076642L,1680887L)
)

df1 %>%
  filter(Event_type != lag(Event_type, 1))
#>    Subject Trial Event_type                    Code    Time
#> 24 VP02_RP    15      Sound         mpossound_test5  888260
#> 25 VP02_RP    15    Picture            pospic_test5  906623
#> 26 VP02_RP    15    Nothing    ev_mnegpos_adj_onset  928623
#> 27 VP02_RP    15   Response                      15  958962
#> 28 VP02_RP    18    Picture                face01_p  987666
#> 29 VP02_RP    18      Sound         mpossound_test6  987668
#> 30 VP02_RP    18    Picture            negpic_test6 1006031
#> 31 VP02_RP    18    Nothing ev_mposnegpos_adj_onset 1028031
#> 32 VP02_RP    18   Response                      15 1076642

Upvotes: 4

Ronak Shah
Ronak Shah

Reputation: 388982

You can use rleid function from data.table which will give you a unique number for every consecutive event values, then using duplicated keep only the first one.

res <- df[!duplicated(data.table::rleid(df$Event_type)), ]
res

#   Subject Trial Event_type                    Code    Time
#23 VP02_RP    15    Picture                face01_n  887969
#24 VP02_RP    15      Sound         mpossound_test5  888260
#25 VP02_RP    15    Picture            pospic_test5  906623
#26 VP02_RP    15    Nothing    ev_mnegpos_adj_onset  928623
#27 VP02_RP    15   Response                      15  958962
#28 VP02_RP    18    Picture                face01_p  987666
#29 VP02_RP    18      Sound         mpossound_test6  987668
#30 VP02_RP    18    Picture            negpic_test6 1006031
#31 VP02_RP    18    Nothing ev_mposnegpos_adj_onset 1028031
#32 VP02_RP    18   Response                      15 1076642

rleid function in base R can be written with rle -

res <- df[!duplicated(with(rle(df$Event_type),rep(seq_along(values), lengths))),]
res

Upvotes: 1

Related Questions