bshelt141
bshelt141

Reputation: 1223

Remove certain rows based on matching

I have this data set:

data <- data.frame(trip_id = c("456B", "123A", "123A", "456B", "456B", "123A", "789C", "789C"),
                   comment = c("void", "", "", "", "", "void", "", "void"),
                   paid = c(0, 100, 100, 250, 250, 0, 125, 0))

print(data)
#trip_id comment paid 
#   456B    void    0
#   123A          100
#   123A          100
#   456B          250
#   456B          250
#   123A    void    0
#   789C          125
#   789C    void    0

I want to be able to programmatically remove rows with "void" in the comment field, as well as a row for every trip_id that has a corresponding row with a "void" in the comment field. Using the example, the output would look like the following:

print(solution)
#trip_id comment paid 
#   123A          100
#   456B          250

Upvotes: 1

Views: 56

Answers (3)

bshelt141
bshelt141

Reputation: 1223

A (relatively) simple answer that I came up with after posting the question, which also accounts for scenarios where you have multiple "voids" per trip_id or no "voids" for a given trip_id:

df_v <- data %>% 
  select(trip_id, comment) %>% 
  filter(trip_id == "void") %>% 
  group_by(trip_id) %>% 
  mutate(indexed = row_number())

df_nv <- data %>%
  filter(comment != "void") %>%
  group_by(trip_id) %>% 
  mutate(indexed = row_number())

final <- dplyr::anti_join(df_nv, df_v, by = c("id", "indexed")) %>% select(-indexed)

Upvotes: 0

zx8754
zx8754

Reputation: 56149

Another dplyr solution (less elegant than @akrun's solution):

library(dplyr)

# get ids to exclude
excl <- data[ data$comment == "void", "trip_id"]

data %>% 
  group_by(trip_id) %>% 
  mutate(rn = if_else(comment == "void", NA_integer_, row_number())) %>% 
  filter(trip_id %in% excl & rn > min(rn, na.rm = TRUE)) %>% 
  ungroup() %>% 
  select(-rn)

# # A tibble: 2 x 3
# trip_id comment  paid
# <fct>   <fct>   <dbl>
# 1 123A    ""        100
# 2 456B    ""        250

Upvotes: 2

akrun
akrun

Reputation: 887078

After doing the group_by, get the index of the rows having 'void' in 'comment' column and remove one of the adjacent rows with slice

library(dplyr)
data %>%
   group_by(trip_id) %>%
   arrange(trip_id, comment != "void") %>%
   slice(setdiff(row_number(), which(comment == "void") + 0:1))
# A tibble: 2 x 3
# Groups:   trip_id [2]
#  trip_id comment  paid
#  <fct>   <fct>   <dbl>
#1 123A    ""        100
#2 456B    ""        250

Upvotes: 2

Related Questions