Joan Sales
Joan Sales

Reputation: 23

Select only specific cases from dataframe

How can I select only the rows that are followed by the same value in a specific column?

i.e. from this (using as a reference the values of the v1)

`v1 = c(1,1,2,3,1,2,4,1,1,2,3,4)
 v2 = seq(1:12)
 v3 = c(rep("blue", 4), rep("red", 4), rep("green", 4))
 df<- data.frame(v1, v2, v3)
 df`

> df
   v1 v2    v3
1   1  1  blue
2   1  2  blue
3   2  3  blue
4   3  4  blue
5   1  5   red
6   2  6   red
7   4  7   red
8   1  8   red
9   1  9 green
10  2 10 green
11  3 11 green
12  4 12 green

to this; where only the cases 1 and 8, that are followed by cases where value for v1 is also 1 are saved

  v1 v2   v3
1  1  1 blue
8  1  8  red

Upvotes: 1

Views: 80

Answers (2)

Christopher Brown
Christopher Brown

Reputation: 41

You could extract the indices of interest with which, and then extract the rows of interest from the dataframe.

indices <- which(df$v1[1:(nrow(df)-1)]==df$v1[2:nrow(df)])
df.new <- df[indices,]

Upvotes: 0

akrun
akrun

Reputation: 887048

We can use rleid from data.table to get the run-length type id, use that as grouping variable, and if the nrow is greater than 1 (.N >1), select the first observation (head(.SD, 1L)).

library(data.table)
setDT(df)[, if(.N>1) head(.SD, 1L) ,.(v1,rleid(v1))][,rleid:= NULL][]
#    v1 v2   v3
# 1:  1  1 blue
# 2:  1  8  red

NOTE: We convert the 'data.frame' to 'data.table' with setDT(df).

Upvotes: 2

Related Questions