Reputation: 59
I have a table (737:35) which has different values for missing observations, e.g. NA, 0, n/a or --
So, the table looks like this:
┌────┬────┬──────┬──────┬──────┬──────┬──────┐
│ X │ Y │ 2010 │ 2011 │ 2012 │ 2013 │ 2014 │
├────┼────┼──────┼──────┼──────┼──────┼──────┤
│ A │ XY │ 0 │ 0 │ 0 │ 10 │ 15 │
│ A │ XZ │ 11 │ 13 │ n/a │ 12 │ 14 │
│ B │ XY │ 0 │ -- │ 0 │ 7 │ -- │
│ B │ XZ │ 15 │ 16 │ 12 │ 13 │ 14 │
│ C │ XY │ NA │ NA │ NA │ NA │ NA │
│ C │ XZ │ 8 │ 12 │ 11 │ n/a │n/a │
└────┴────┴──────┴──────┴──────┴──────┴──────┘
I want to delete all rows which have a value of either NA, 0, n/a or --, so I end up having only row 5 in this example
I could get rid of the rows with NA using is.na()
and deleted the rows with a 0 in the column 2010 using grep()
; could someone tell me how I can delete all rows missing an observation in an easy way?
Upvotes: 1
Views: 86
Reputation: 269481
Use apply
to go row by row checking whether any element of that row is in the character vector na.strings
. This gives a logical vector of one element per row which where TRUE means the corresponding row has an element found in na.strings
. Negate that logical vector and subscript df1
by it.
na.strings <- c(0, NA, "NA", "--", "n/a")
df1[ !apply(df1, 1, function(x) any(x %in% na.strings)), ]
This does not distinguish between 0 and "0" or between NA and "NA" but I suspect you want all those to be regarded as missing. If you did want to distinguish between NA and "NA" and only regard NA but not "NA" as missing then omit "NA" from na.strings
.
Note that if you are reading this in from a text file then you could specify na.strings
in your read.table
command:
na.omit(read.table("myfile", na.strings = c("NA", "0", "n/a", "--"), ...whatever...))
Upvotes: 3
Reputation: 886998
An option would be to use %in%
to check whether values '0', 'n/a', or '--') are present in the column along with is.na
using |
, Reduce
it to a single logical vector
with |
to check whether any value in the row have these characters, negate (!
- meaning none of the elements in the row have that character) and use that o subset the rows
i1 <- !Reduce(`|`, lapply(df1, function(x) is.na(x)| x %in% c(0, "n/a", "--")))
df2 <- df1[i1, , drop = FALSE]
# X Y 2010 2011 2012 2013 2014
#4 B XZ 15 16 12 13 14
Or using filter_all
library(dplyr)
df1 %>%
filter_all(all_vars(!(is.na(.) | . %in% c(0, "n/a", "--"))))
# X Y 2010 2011 2012 2013 2014
#1 B XZ 15 16 12 13 14
df1 <- structure(list(X = c("A", "A", "B", "B", "C", "C"), Y = c("XY",
"XZ", "XY", "XZ", "XY", "XZ"), `2010` = c(0L, 11L, 0L, 15L, NA,
8L), `2011` = c("0", "13", "--", "16", NA, "12"), `2012` = c("0",
"n/a", "0", "12", NA, "11"), `2013` = c("10", "12", "7", "13",
NA, "n/a"), `2014` = c("15", "14", "--", "14", NA, "n/a")),
class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 4