Saunok Chakrabarty
Saunok Chakrabarty

Reputation: 49

Remove all rows containing certain strings in R

I want to remove all rows from a data frame that contain certain strings. The strings - call them "abc1", "abc2", "abc3" and so forth - appear under different columns at different rows in the dataset. For example, "abc1" may appear in the first column at row 15, and then appear in the second column at row 20. I want to delete all rows that contain any of these strings. The solutions I looked on were based on a single variable containing the strings in question - how do I do this efficiently when the strings appear under more than one variable?

Upvotes: 2

Views: 685

Answers (1)

akrun
akrun

Reputation: 887981

We may use filter with if_any to loop over the character class columns, check whether the elements have abc followed by any digits with str_detect, negate (!) so that we return rows without any of those elements

library(dplyr)
library(stringr)
df1 %>%
   filter(!if_any(where(is.character), ~ str_detect(.x, "^abc\\d+")))

-output

   col1 col2 col3
1  ac2    3   5d
2   4d    4   3c

Or using base R

subset(df1, !Reduce(`|`, lapply(Filter(is.character, df1),
    grepl, pattern = "^abc\\d+")))
  col1 col2 col3
3  ac2    3   5d
4   4d    4   3c

Or may also do

subset(df1, !grepl("abc\\d+", do.call(paste, df1)))
  col1 col2 col3
3  ac2    3   5d
4   4d    4   3c

data

df1 <- structure(list(col1 = c("abc1", "xyz1", "ac2", "4d"), col2 = 1:4, 
    col3 = c("1d", "abc3", "5d", "3c")), class = "data.frame", row.names = c(NA, 
-4L))

Upvotes: 2

Related Questions