Reputation: 23
How can I select only the rows that are followed by the same value in a specific column?
i.e. from this (using as a reference the values of the v1)
`v1 = c(1,1,2,3,1,2,4,1,1,2,3,4)
v2 = seq(1:12)
v3 = c(rep("blue", 4), rep("red", 4), rep("green", 4))
df<- data.frame(v1, v2, v3)
df`
> df
v1 v2 v3
1 1 1 blue
2 1 2 blue
3 2 3 blue
4 3 4 blue
5 1 5 red
6 2 6 red
7 4 7 red
8 1 8 red
9 1 9 green
10 2 10 green
11 3 11 green
12 4 12 green
to this; where only the cases 1 and 8, that are followed by cases where value for v1 is also 1 are saved
v1 v2 v3
1 1 1 blue
8 1 8 red
Upvotes: 1
Views: 80
Reputation: 41
You could extract the indices of interest with which, and then extract the rows of interest from the dataframe.
indices <- which(df$v1[1:(nrow(df)-1)]==df$v1[2:nrow(df)])
df.new <- df[indices,]
Upvotes: 0
Reputation: 887048
We can use rleid
from data.table
to get the run-length type id, use that as grouping variable, and if
the nrow is greater than 1 (.N >1
), select the first observation (head(.SD, 1L)
).
library(data.table)
setDT(df)[, if(.N>1) head(.SD, 1L) ,.(v1,rleid(v1))][,rleid:= NULL][]
# v1 v2 v3
# 1: 1 1 blue
# 2: 1 8 red
NOTE: We convert the 'data.frame' to 'data.table' with setDT(df)
.
Upvotes: 2