Using R: How can I delete rows when they contain certain values?

Question

I have a table (737:35) which has different values for missing observations, e.g. NA, 0, n/a or --

So, the table looks like this:

┌────┬────┬──────┬──────┬──────┬──────┬──────┐
│ X  │ Y  │ 2010 │ 2011 │ 2012 │ 2013 │ 2014 │
├────┼────┼──────┼──────┼──────┼──────┼──────┤
│ A  │ XY │ 0    │ 0    │ 0    │ 10   │ 15   │
│ A  │ XZ │ 11   │ 13   │ n/a  │ 12   │ 14   │
│ B  │ XY │ 0    │ --   │ 0    │ 7    │ --   │
│ B  │ XZ │ 15   │ 16   │ 12   │ 13   │ 14   │
│ C  │ XY │ NA   │ NA   │ NA   │ NA   │ NA   │
│ C  │ XZ │ 8    │ 12   │ 11   │ n/a  │n/a   │
└────┴────┴──────┴──────┴──────┴──────┴──────┘

I want to delete all rows which have a value of either NA, 0, n/a or --, so I end up having only row 5 in this example

I could get rid of the rows with NA using is.na()and deleted the rows with a 0 in the column 2010 using grep(); could someone tell me how I can delete all rows missing an observation in an easy way?

akrun · Accepted Answer

An option would be to use %in% to check whether values '0', 'n/a', or '--') are present in the column along with is.na using |, Reduce it to a single logical vector with | to check whether any value in the row have these characters, negate (! - meaning none of the elements in the row have that character) and use that o subset the rows

i1 <- !Reduce(`|`, lapply(df1, function(x) is.na(x)| x %in% c(0, "n/a", "--")))
df2 <- df1[i1, , drop = FALSE]
#   X  Y 2010 2011 2012 2013 2014
#4 B XZ   15   16   12   13   14

Or using filter_all

library(dplyr)
df1 %>%
     filter_all(all_vars(!(is.na(.) | . %in% c(0, "n/a", "--"))))
#   X  Y 2010 2011 2012 2013 2014
#1 B XZ   15   16   12   13   14

data

df1 <- structure(list(X = c("A", "A", "B", "B", "C", "C"), Y = c("XY", 
"XZ", "XY", "XZ", "XY", "XZ"), `2010` = c(0L, 11L, 0L, 15L, NA, 
8L), `2011` = c("0", "13", "--", "16", NA, "12"), `2012` = c("0", 
"n/a", "0", "12", NA, "11"), `2013` = c("10", "12", "7", "13", 
NA, "n/a"), `2014` = c("15", "14", "--", "14", NA, "n/a")),
class = "data.frame", row.names = c(NA, 
-6L))

Using R: How can I delete rows when they contain certain values?

Answers (2)

data

Related Questions