Reputation: 150

R: Select rows by value and always include previous row

I'm trying to subset a data frame to all rows for which a certain column value is '13', but I want all rows preceding a row with '13' to be included too, no matter the value in that column.

I do not want a row to be included twice when it both precedes a row with '13' in the specific column, but also has the value '13' itself.

Here is an example data frame and solution, whereby the condition (subset rows to rows with time = 13 and (time=13)-1, without duplicating)

ID  speed   dist    time
A   4        12     4
B   7        10     8
C   7        18     13
C   8        4      5
A   5        6      13
D   6        2      13
E   7        2      9

Becomes

ID  speed   dist    time
B   7       10      8
C   7       18      13
C   8       4       5
A   5       6       13
D   6       2       13

Upvotes: 2

Answers (3)

ThomasIsCoding

Reputation: 101247

A base R option using subset + diff

> subset(df, time == 13 | c(diff(time == 13), FALSE))
  ID speed dist time
2  B     7   10    8
3  C     7   18   13
4  C     8    4    5
5  A     5    6   13
6  D     6    2   13

Upvotes: 0

IceCreamToucan

Reputation: 28675

df <- read.table(text = '
ID  speed   dist    time
A   4        12     4
B   7        10     8
C   7        18     13
C   8        4      5
A   5        6      13
D   6        2      13
E   7        2      9
', header = TRUE)

library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 4.1.2

df %>% 
  filter(time == 13 | lead(time == 13))
#>   ID speed dist time
#> 1  B     7   10    8
#> 2  C     7   18   13
#> 3  C     8    4    5
#> 4  A     5    6   13
#> 5  D     6    2   13

^{Created on 2022-07-19 by the reprex package (v2.0.1)}

Upvotes: 3

akrun

Reputation: 887048

Create a position index where 'time' value is 13 using which and then subtract 1 from the index and concatenate both to subset

i1 <- which(df1$time == 13) 
ind <- sort(unique(i1 - rep(c(1, 0), each = length(i1))))
ind <- ind[ind >0]
df1[ind,]

-output

  ID speed dist time
2  B     7   10    8
3  C     7   18   13
4  C     8    4    5
5  A     5    6   13
6  D     6    2   13

data

df1 <- structure(list(ID = c("A", "B", "C", "C", "A", "D", "E"), speed = c(4L, 
7L, 7L, 8L, 5L, 6L, 7L), dist = c(12L, 10L, 18L, 4L, 6L, 2L, 
2L), time = c(4L, 8L, 13L, 5L, 13L, 13L, 9L)), 
class = "data.frame", row.names = c(NA, 
-7L))

Upvotes: 1

R: Select rows by value and always include previous row

Answers (3)

data

Related Questions