Reputation: 150
I'm trying to subset a data frame to all rows for which a certain column value is '13', but I want all rows preceding a row with '13' to be included too, no matter the value in that column.
I do not want a row to be included twice when it both precedes a row with '13' in the specific column, but also has the value '13' itself.
Here is an example data frame and solution, whereby the condition (subset rows to rows with time = 13 and (time=13)-1, without duplicating)
ID speed dist time
A 4 12 4
B 7 10 8
C 7 18 13
C 8 4 5
A 5 6 13
D 6 2 13
E 7 2 9
Becomes
ID speed dist time
B 7 10 8
C 7 18 13
C 8 4 5
A 5 6 13
D 6 2 13
Upvotes: 2
Views: 944
Reputation: 101247
A base R option using subset
+ diff
> subset(df, time == 13 | c(diff(time == 13), FALSE))
ID speed dist time
2 B 7 10 8
3 C 7 18 13
4 C 8 4 5
5 A 5 6 13
6 D 6 2 13
Upvotes: 0
Reputation: 28675
df <- read.table(text = '
ID speed dist time
A 4 12 4
B 7 10 8
C 7 18 13
C 8 4 5
A 5 6 13
D 6 2 13
E 7 2 9
', header = TRUE)
library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 4.1.2
df %>%
filter(time == 13 | lead(time == 13))
#> ID speed dist time
#> 1 B 7 10 8
#> 2 C 7 18 13
#> 3 C 8 4 5
#> 4 A 5 6 13
#> 5 D 6 2 13
Created on 2022-07-19 by the reprex package (v2.0.1)
Upvotes: 3
Reputation: 887048
Create a position index where 'time' value is 13 using which
and then subtract 1 from the index and concatenate both to subset
i1 <- which(df1$time == 13)
ind <- sort(unique(i1 - rep(c(1, 0), each = length(i1))))
ind <- ind[ind >0]
df1[ind,]
-output
ID speed dist time
2 B 7 10 8
3 C 7 18 13
4 C 8 4 5
5 A 5 6 13
6 D 6 2 13
df1 <- structure(list(ID = c("A", "B", "C", "C", "A", "D", "E"), speed = c(4L,
7L, 7L, 8L, 5L, 6L, 7L), dist = c(12L, 10L, 18L, 4L, 6L, 2L,
2L), time = c(4L, 8L, 13L, 5L, 13L, 13L, 9L)),
class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 1