Reputation: 1140
I would like to drop any occurrence of a factor level for which one row contains a missing value
Example:
ID var1 var2
1 1 2
1 NA 3
2 1 2
2 2 4
So, in this hypothetical, what would be left would be:
ID var1 var2
2 1 2
2 2 4
Upvotes: 3
Views: 162
Reputation: 118779
In the current development version of data.table
, there's a new implementation of na.omit
for data.tables, which takes a cols =
and invert =
arguments.
The cols =
allows to specify the columns on which to look for NAs
. And invert = TRUE
returns the NA rows instead, instead of omitting them.
You can install the devel version by following these instructions. Or you can wait for 1.9.6 on CRAN at some point. Using that, we can do:
require(data.table) ## 1.9.5+
setkey(setDT(df), ID)
df[!na.omit(df, invert = TRUE)]
# ID var1 var2
# 1: 2 1 2
# 2: 2 2 4
How this works:
setDT
converts data.frame to data.table by reference.
setkey
sorts the data.table by the columns provided and marks those columns as key columns so that we can perform a join.
na.omit(df, invert = TRUE)
gives just those rows that have NA
anywhere.
X[!Y]
does an anit-join by joining on the key column ID
, and returns all the rows that don't match ID = 1
(from Y
). Check this post to read in detail about data.table's joins.
HTH
Upvotes: 2
Reputation: 70256
Here's one more option in base R. It will check all columns for NAs.
df[!df$ID %in% df$ID[rowSums(is.na(df)) > 0],]
# ID var1 var2
#3 2 1 2
#4 2 2 4
If you only want to check in column "var1" you can do:
df[!with(df, ID %in% ID[is.na(var1)]),]
# ID var1 var2
#3 2 1 2
#4 2 2 4
Upvotes: 3
Reputation: 886948
Assuming that NAs
would occur in both var
columns,
df[with(df, !ave(!!rowSums(is.na(df[,-1])), ID, FUN=any)),]
# ID var1 var2
#3 2 1 2
#4 2 2 4
Or if it is only specific to var1
df[with(df, !ave(is.na(var1), ID, FUN=any)),]
# ID var1 var2
#3 2 1 2
#4 2 2 4
Or using dplyr
library(dplyr)
df %>%
group_by(ID) %>%
filter(all(!is.na(var1)))
# ID var1 var2
#1 2 1 2
#2 2 2 4
df <- structure(list(ID = c(1L, 1L, 2L, 2L), var1 = c(1L, NA, 1L, 2L
), var2 = c(2L, 3L, 2L, 4L)), .Names = c("ID", "var1", "var2"
), class = "data.frame", row.names = c(NA, -4L))
Upvotes: 3
Reputation: 92282
Hers's possible data.table
solution (sorry @rawr)
library(data.table)
setDT(df)[, if (all(!is.na(.SD))) .SD, ID]
# ID var1 var2
# 1: 2 1 2
# 2: 2 2 4
If you only want to check var1
then
df[, if (all(!is.na(var1))) .SD, ID]
# ID var1 var2
# 1: 2 1 2
# 2: 2 2 4
Upvotes: 4