Remove rows in data.frame when the entire row values match a regex, or match a group of values

Question

I have a data frame like this (correct values are just an example):

df <- data.frame(a=c(" ","NO_DATA","   "," ",NA,NA,3),
                 b=c("NO_DATA","NO_DATA",""," ",NA,2," "),
                 c=c("NO_DATA","NO_DATA","","",NA,2,3),
                 d=c("NO_DATA","NO_DATA","","",NA,2,3),
                 e=c("  ","NO_DATA","","",NA,2,"NO_DATA"))

        a       b       c       d       e
1         NO_DATA NO_DATA NO_DATA          <- I want to Remove this
2 NO_DATA NO_DATA NO_DATA NO_DATA NO_DATA  <- I want to Remove this
3                                          <- I want to Remove this
4                                          <- I want to Remove this
5                      <- I want to Remove this
6           2       2       2       2  <- Preserve
7       3               3       3 NO_DATA  <- Preserve

I need to remove all rows with values: "", " " (or any number of just spaces),NA,"NO_DATA"; but present in all columns in the same row.

I tried using subset, but the logic seems to be wrong since even this:

subset(df, a != "NO_DATA" & b != "NO_DATA")

results in a wrong result:

    a b c d       e
3                  
4                  
7   3   3 3 NO_DATA

This is the result I want:

     a       b       c       d       e   
6           2       2       2       2  
7       3               3       3 NO_DATA

I would like to use a regex beacuse possible values could vary

lroha · Accepted Answer

You can subset using:

df[rowSums(!sapply(df, function(x) trimws(x) %in% c("", "NO_DATA") | is.na(x))) > 0, ]

     a b c d       e
6  2 2 2       2
7    3   3 3 NO_DATA

Remove rows in data.frame when the entire row values match a regex, or match a group of values

Answers (2)

Related Questions