kat
kat

Reputation: 125

How to find and delete a certain number of rows with the same consecutive value in a column in a dataframe in R?

In my dataframe there is a column with "Sound" and "Response" as values. Ideally, the pattern is two Sounds followed by one Response. But, it can happen that there are three Sounds followed by a Response.

How can I tell R to raise a flag whenever it finds this pattern in my data? I need to look at each case individually before I can delete the third Sound-row.

>df <- data.frame(V1=rep("SN", 7),  
             V3=c("Sound", "Sound", "Response", "Sound", "Sound", "Sound", "Response"), 
             V4=c("XYZc02i03", "XYZq02i03", 200, "ZYXc01i30", "ZYXq01i30", "ZYXc01i35", 100), 
             stringsAsFactors=FALSE) 

V1       V3        V4
SN    Sound XYZc02i03
SN    Sound XYZq02i03
SN Response       200
SN    Sound ZYXc01i30
SN    Sound ZYXq01i30
SN    Sound ZYXc01i35
SN Response       100     

So, after finding three consecutive Sounds and deleting the last one of them (i. e. the one just before the folowing Response), I should have the desired pattern like this:

V1       V3        V4
SN    Sound XYZc02i03
SN    Sound XYZq02i03
SN Response       200
SN    Sound ZYXc01i30
SN    Sound ZYXq01i30
SN Response       100  

I'm sorry that I keep posting these basic questions. Any ideas are, as always, greatly appreciated!

Upvotes: 2

Views: 228

Answers (2)

Mark Miller
Mark Miller

Reputation: 13123

I think this will work, although there are probably much simpler solutions:

df <- data.frame(V1=rep("SN", 7),  
             V3=c("Sound", "Sound", "Response", "Sound", "Sound", "Sound", "Response"), 
             V4=c("XYZc02i03", "XYZq02i03", 200, "ZYXc01i30", "ZYXq01i30", "ZYXc01i35", 100), 
             stringsAsFactors=FALSE)

df

my.run <- rep(0,dim(df)[1])

if(df$V3[1]=='Sound') (my.run[1] = 1) else my.run[1] = 0

for (i in 2:dim(df)[1]) {

     if(df$V3[i]=='Sound') (my.run[i] = my.run[i-1] + 1) else my.run[i] = 0

}

df2 <- df[my.run < 3,]
df2

Upvotes: 2

Julius Vainora
Julius Vainora

Reputation: 48251

cumsum(rle(df$V3)$lengths)[rle(df$V3)$lengths == 3]
[1] 6

this returns the vector of positions where "Sound" is third in a row. Now you can easily delete them or make some column to mark these positions.

Upvotes: 4

Related Questions