Reputation: 1359
I have the following data-frame:
i3<-c(1,1,1,1,2,2)
i2<-c(NA,1,1,1,2,2)
i1<-c(1,NA,2,4,5,3)
newdat1<-data.frame(i3,i2,i1)
print(newdat1)
i3 i2 i1
1 1 NA 1
2 1 1 NA
3 1 1 2
4 1 1 4
5 2 2 5
6 2 2 3
I realize the solution for this is quite simple, but I am trying to return all the columns that any NA so that the final result looks like:
i2 i1
1 NA 1
2 1 NA
3 1 2
4 1 4
5 2 5
6 2 3
I have found the following code which does the opposite:
newdat1<-newdat1[, sapply(newdat1, Negate(anyNA)), drop = FALSE]
But I cannot find exactly what I am looking for. Thank you.
Upvotes: 2
Views: 166
Reputation: 92292
So I just want to bring your attention that OPs solution is actually the best one (as I expected) because apply
and colSums
convert the whole data.frame
to a matrix
, while the other solution transposes the whole data set.
OPs own sapply
solution works on vectors without transforming the whole data set while implementing a Primitive function, here are some benchmarks on a bigger data set
set.seed(123)
bidData <- as.data.frame(replicate(1e4, sample(c(NA, 1:3), 1e4, replace = TRUE)))
library(microbenchmark)
microbenchmark(
mpalanco=bidData[,!complete.cases(t(bidData)), drop = FALSE],
mikechir=bidData[,is.na(colSums(bidData)), drop = FALSE],
sabddem =bidData[,!apply(bidData, 2, function(x) sum(is.na(x)) == 0 ), drop = FALSE],
OP = bidData[, sapply(bidData, anyNA), drop = FALSE])
# Unit: milliseconds
# expr min lq mean median uq max neval
# mpalanco 2347.0316 2401.32940 2434.24480 2421.22703 2449.32975 2972.82020 100
# mikechir 352.8597 363.01980 425.11366 403.58777 477.06792 799.15855 100
# sabddem 1869.2324 2025.22459 2591.11786 2812.56430 2853.55268 3655.91325 100
# OP 17.5455 18.25625 18.99749 18.65456 19.54728 25.36552 100
Upvotes: 2
Reputation: 34703
Using base R
and colSums
:
newdat1[,is.na(colSums(newdat1))]
i2 i1
1 NA 1
2 1 NA
3 1 2
4 1 4
5 2 5
6 2 3
Upvotes: 1
Reputation: 13570
newdat1[!complete.cases(t(newdat1))]
Output:
i2 i1
1 NA 1
2 1 NA
3 1 2
4 1 4
5 2 5
6 2 3
Upvotes: 5
Reputation: 7190
A solution with apply
and subsetting:
ind <- apply(newdat1, 2, function(x) sum(is.na(x)) == 0 )
newdat1[!ind]
i2 i1
1 NA 1
2 1 NA
3 1 2
4 1 4
5 2 5
6 2 3
Upvotes: 1