TTT
TTT

Reputation: 4434

Remove columns from dataframe where some of values are NA

I have a dataframe where some of the values are NA. I would like to remove these columns.

My data.frame looks like this

    v1   v2 
1    1   NA 
2    1    1 
3    2    2 
4    1    1 
5    2    2 
6    1   NA

I tried to estimate the col mean and select the column means !=NA. I tried this statement, it does not work.

data=subset(Itun, select=c(is.na(colMeans(Itun))))

I got an error,

error : 'x' must be an array of at least two dimensions

Can anyone give me some help?

Upvotes: 46

Views: 120814

Answers (8)

user2110417
user2110417

Reputation:

You can also try:

df <- df[,colSums(is.na(df))<nrow(df)]

Upvotes: 0

HBat
HBat

Reputation: 5702

Alternatively, select(where(~FUNCTION)) can be used:

library(dplyr)

(df <- data.frame(x = letters[1:5], y = NA, z = c(1:4, NA)))
#>   x  y  z
#> 1 a NA  1
#> 2 b NA  2
#> 3 c NA  3
#> 4 d NA  4
#> 5 e NA NA

# Remove columns where all values are NA
df %>% 
  select(where(~!all(is.na(.))))
#>   x  z
#> 1 a  1
#> 2 b  2
#> 3 c  3
#> 4 d  4
#> 5 e NA
  
# Remove columns with at least one NA  
df %>% 
  select(where(~!any(is.na(.))))
#>   x
#> 1 a
#> 2 b
#> 3 c
#> 4 d
#> 5 e

Upvotes: 18

Matt Dancho
Matt Dancho

Reputation: 7308

Here's a convenient way to do it using the dplyr function select_if(). Combine not (!), any() and is.na(), which is equivalent to selecting all columns that don't contain any NA values.

library(dplyr)
Itun %>%
    select_if(~ !any(is.na(.)))

Upvotes: 47

Oriol Prat
Oriol Prat

Reputation: 1047

Another alternative with the dplyr package would be to make use of the Filter function

Filter(function(x) !any(is.na(x)), Itun)

with data.table would be a little more cumbersome

setDT(Itun)[,.SD,.SDcols=setdiff((1:ncol(Itun)),
                                which(colSums(is.na(Itun))>0))]

Upvotes: 2

lmo
lmo

Reputation: 38520

A base R method related to the apply answers is

Itun[!unlist(vapply(Itun, anyNA, logical(1)))]
  v1
1  1
2  1
3  2
4  1
5  2
6  1

Here, vapply is used as we are operating on a list, and, apply, it does not coerce the object into a matrix. Also, since we know that the output will be logical vector of length 1, we can feed this to vapply and potentially get a little speed boost. For the same reason, I used anyNA instead of any(is.na()).

Upvotes: 2

Scott Worland
Scott Worland

Reputation: 1402

You can use transpose twice:

newdf <- t(na.omit(t(df)))

Upvotes: 14

Backlin
Backlin

Reputation: 14872

data[,!apply(is.na(data), 2, any)]

Upvotes: 6

Sven Hohenstein
Sven Hohenstein

Reputation: 81743

The data:

Itun <- data.frame(v1 = c(1,1,2,1,2,1), v2 = c(NA, 1, 2, 1, 2, NA)) 

This will remove all columns containing at least one NA:

Itun[ , colSums(is.na(Itun)) == 0]

An alternative way is to use apply:

Itun[ , apply(Itun, 2, function(x) !any(is.na(x)))]

Upvotes: 71

Related Questions