Paolo RLang
Paolo RLang

Reputation: 1704

R language check missing data for columns and rows

I have a data frame sells and I want to check the missing data in both rows and columns

What I did for rows is:

sells[, complete.cases(sells)]

nrows(sells[, complete.cases(sells)])

but I didn't know who to solve if for columns

Help please

Upvotes: 2

Views: 88

Answers (2)

Dominic Comtois
Dominic Comtois

Reputation: 10421

First let's take the iris dataframe and insert randomly some NA's:

iris.demo <- iris
iris.nas <- matrix(as.logical(sample(FALSE:TRUE, size = 150*5, 
                              prob = c(.9,.1),replace = TRUE)),ncol = 5)
iris.demo[iris.nas] <- NA

For rows, it is pretty straightforward:

sum(complete.cases(iris.demo))
# [1] 75

For columns, two possibilities (among several possible others):

  1. Transposing the whole dataframe

    sum(complete.cases(t(iris.demo)))
    # [1] 0   # 0 columns are complete
    
  2. Using lapply to count the "non-missing" on every column and see if it's equal to nrow:

    sum(lapply(iris.demo, function(x) sum(!is.na(x))) == nrow(iris.demo))
    # [1] 0
    

Upvotes: 1

lukeA
lukeA

Reputation: 54287

You could do it like this:

set.seed(1) 
(sells <- data.frame(replicate(2, sample(c(1:3, NA), 10, T)), x3 = 1:10))
#    X1 X2 x3
# 1  NA  2  1
# 2   1  3  2
# 3   3  2  3
# 4   1  1  4
# 5   2 NA  5
# 6   2  3  6
# 7   1 NA  7
# 8   2  1  8
# 9  NA  3  9
# 10  2  2 10

Rows:

sells[complete.cases(sells), ]
#   X1 X2 x3
# 1  2  1  1
# 2  2  1  2
# 3  3  3  3
# 9  3  2  9
nrow(sells[complete.cases(sells), ])
# [1] 6

Columns:

sells[, sapply(sells, function(col) any(is.na(col)))]
#    X1 X2
# 1   2  1
# 2   2  1
# 3   3  3
# 4  NA  2
# 5   1 NA
# 6  NA  2
# 7  NA  3
# 8   3 NA
# 9   3  2
# 10  1 NA
sum(sapply(sells, function(col) any(is.na(col))))
# [1] 2

Upvotes: 0

Related Questions