Reputation: 49
I want to remove all rows from a data frame that contain certain strings. The strings - call them "abc1", "abc2", "abc3" and so forth - appear under different columns at different rows in the dataset. For example, "abc1" may appear in the first column at row 15, and then appear in the second column at row 20. I want to delete all rows that contain any of these strings. The solutions I looked on were based on a single variable containing the strings in question - how do I do this efficiently when the strings appear under more than one variable?
Upvotes: 2
Views: 685
Reputation: 887981
We may use filter
with if_any
to loop over the character class columns, check whether the elements have abc followed by any digits with str_detect
, negate (!
) so that we return rows without any of those elements
library(dplyr)
library(stringr)
df1 %>%
filter(!if_any(where(is.character), ~ str_detect(.x, "^abc\\d+")))
-output
col1 col2 col3
1 ac2 3 5d
2 4d 4 3c
Or using base R
subset(df1, !Reduce(`|`, lapply(Filter(is.character, df1),
grepl, pattern = "^abc\\d+")))
col1 col2 col3
3 ac2 3 5d
4 4d 4 3c
Or may also do
subset(df1, !grepl("abc\\d+", do.call(paste, df1)))
col1 col2 col3
3 ac2 3 5d
4 4d 4 3c
df1 <- structure(list(col1 = c("abc1", "xyz1", "ac2", "4d"), col2 = 1:4,
col3 = c("1d", "abc3", "5d", "3c")), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 2