Reputation: 1223
I have this data set:
data <- data.frame(trip_id = c("456B", "123A", "123A", "456B", "456B", "123A", "789C", "789C"),
comment = c("void", "", "", "", "", "void", "", "void"),
paid = c(0, 100, 100, 250, 250, 0, 125, 0))
print(data)
#trip_id comment paid
# 456B void 0
# 123A 100
# 123A 100
# 456B 250
# 456B 250
# 123A void 0
# 789C 125
# 789C void 0
I want to be able to programmatically remove rows with "void" in the comment
field, as well as a row for every trip_id
that has a corresponding row with a "void" in the comment
field. Using the example, the output would look like the following:
print(solution)
#trip_id comment paid
# 123A 100
# 456B 250
Upvotes: 1
Views: 56
Reputation: 1223
A (relatively) simple answer that I came up with after posting the question, which also accounts for scenarios where you have multiple "voids" per trip_id
or no "voids" for a given trip_id
:
df_v <- data %>%
select(trip_id, comment) %>%
filter(trip_id == "void") %>%
group_by(trip_id) %>%
mutate(indexed = row_number())
df_nv <- data %>%
filter(comment != "void") %>%
group_by(trip_id) %>%
mutate(indexed = row_number())
final <- dplyr::anti_join(df_nv, df_v, by = c("id", "indexed")) %>% select(-indexed)
Upvotes: 0
Reputation: 56149
Another dplyr solution (less elegant than @akrun's solution):
library(dplyr)
# get ids to exclude
excl <- data[ data$comment == "void", "trip_id"]
data %>%
group_by(trip_id) %>%
mutate(rn = if_else(comment == "void", NA_integer_, row_number())) %>%
filter(trip_id %in% excl & rn > min(rn, na.rm = TRUE)) %>%
ungroup() %>%
select(-rn)
# # A tibble: 2 x 3
# trip_id comment paid
# <fct> <fct> <dbl>
# 1 123A "" 100
# 2 456B "" 250
Upvotes: 2
Reputation: 887078
After doing the group_by
, get the index of the rows having 'void' in 'comment' column and remove one of the adjacent rows with slice
library(dplyr)
data %>%
group_by(trip_id) %>%
arrange(trip_id, comment != "void") %>%
slice(setdiff(row_number(), which(comment == "void") + 0:1))
# A tibble: 2 x 3
# Groups: trip_id [2]
# trip_id comment paid
# <fct> <fct> <dbl>
#1 123A "" 100
#2 456B "" 250
Upvotes: 2