Reputation: 97
I have data that looks something like this:
df <- data.frame(station = c("A", "A", "Bad", "A", "B", "Bad", "B", "C"),
values = c(8.1, 3.3, NA, 9.1, 9.4, 6.5, 15.3, 7.8))
station values
1 A 8.1
2 A 3.3
3 Bad NA
4 A 9.1
5 B 9.4
6 Bad 6.5
7 B 15.3
8 C 7.8
I want to delete the rows above the rows in which the station is "Bad". I will eventually also delete the rows in which the station is "Bad" as well but I know how to do that and it is a separate question.
The output for now should look something like this:
output <- data.frame(station = c("A", "Bad", "A", "Bad", "B", "C"),
values = c(8.1, NA, 9.1, 6.5, 15.3, 7.8))
station values
1 A 8.1
2 Bad NA
3 A 9.1
4 Bad 6.5
5 B 15.3
6 C 7.8
So far I have been trying to use the dplyr filter function with variations similar to this:
output <- df %>%
filter(values != ([-1] == "Bad"))
I understand that the "[-1]" is not the right way to index the row above so what is the correct way to do that?
Upvotes: 0
Views: 716
Reputation: 18612
Another base R
solution is:
df[-(which(df$station == "Bad") - 1),]
Output
station values
1 A 8.1
3 Bad NA
4 A 9.1
6 Bad 6.5
7 B 15.3
8 C 7.8
Upvotes: 1
Reputation: 388817
You can use lead
:
library(dplyr)
df %>% filter(lead(station, default = last(station)) != 'Bad')
# station values
#1 A 8.1
#2 Bad NA
#3 A 9.1
#4 Bad 6.5
#5 B 15.3
#6 C 7.8
Or in base R and data.table
:
#Base R
subset(df, c(tail(station, -1) != 'Bad', TRUE))
#Data table
library(data.table)
setDT(df)[shift(station, fill = last(station), type = 'lead') != 'Bad']
Upvotes: 1