Lila
Lila

Reputation: 59

Using R: How can I delete rows when they contain certain values?

I have a table (737:35) which has different values for missing observations, e.g. NA, 0, n/a or --

So, the table looks like this:

┌────┬────┬──────┬──────┬──────┬──────┬──────┐
│ X  │ Y  │ 2010 │ 2011 │ 2012 │ 2013 │ 2014 │
├────┼────┼──────┼──────┼──────┼──────┼──────┤
│ A  │ XY │ 0    │ 0    │ 0    │ 10   │ 15   │
│ A  │ XZ │ 11   │ 13   │ n/a  │ 12   │ 14   │
│ B  │ XY │ 0    │ --   │ 0    │ 7    │ --   │
│ B  │ XZ │ 15   │ 16   │ 12   │ 13   │ 14   │
│ C  │ XY │ NA   │ NA   │ NA   │ NA   │ NA   │
│ C  │ XZ │ 8    │ 12   │ 11   │ n/a  │n/a   │
└────┴────┴──────┴──────┴──────┴──────┴──────┘

I want to delete all rows which have a value of either NA, 0, n/a or --, so I end up having only row 5 in this example

I could get rid of the rows with NA using is.na()and deleted the rows with a 0 in the column 2010 using grep(); could someone tell me how I can delete all rows missing an observation in an easy way?

Upvotes: 1

Views: 86

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269481

Use apply to go row by row checking whether any element of that row is in the character vector na.strings. This gives a logical vector of one element per row which where TRUE means the corresponding row has an element found in na.strings. Negate that logical vector and subscript df1 by it.

na.strings <- c(0, NA, "NA", "--", "n/a")
df1[ !apply(df1, 1, function(x) any(x %in% na.strings)), ]

This does not distinguish between 0 and "0" or between NA and "NA" but I suspect you want all those to be regarded as missing. If you did want to distinguish between NA and "NA" and only regard NA but not "NA" as missing then omit "NA" from na.strings.

Note that if you are reading this in from a text file then you could specify na.strings in your read.table command:

na.omit(read.table("myfile", na.strings = c("NA", "0", "n/a", "--"), ...whatever...))

Upvotes: 3

akrun
akrun

Reputation: 886998

An option would be to use %in% to check whether values '0', 'n/a', or '--') are present in the column along with is.na using |, Reduce it to a single logical vector with | to check whether any value in the row have these characters, negate (! - meaning none of the elements in the row have that character) and use that o subset the rows

i1 <- !Reduce(`|`, lapply(df1, function(x) is.na(x)| x %in% c(0, "n/a", "--")))
df2 <- df1[i1, , drop = FALSE]
#   X  Y 2010 2011 2012 2013 2014
#4 B XZ   15   16   12   13   14

Or using filter_all

library(dplyr)
df1 %>%
     filter_all(all_vars(!(is.na(.) | . %in% c(0, "n/a", "--"))))
#   X  Y 2010 2011 2012 2013 2014
#1 B XZ   15   16   12   13   14

data

df1 <- structure(list(X = c("A", "A", "B", "B", "C", "C"), Y = c("XY", 
"XZ", "XY", "XZ", "XY", "XZ"), `2010` = c(0L, 11L, 0L, 15L, NA, 
8L), `2011` = c("0", "13", "--", "16", NA, "12"), `2012` = c("0", 
"n/a", "0", "12", NA, "11"), `2013` = c("10", "12", "7", "13", 
NA, "n/a"), `2014` = c("15", "14", "--", "14", NA, "n/a")),
class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 4

Related Questions