Mar
Mar

Reputation: 15

How to count the number of rows with NA values in specific columns?

I have a dataframe with 502543 obs. of 9 variables including the ID (which is repeated several times). I need to count how many rows have NA values in all variables except in ID. I don't want to delete this ID column, as later I will need to count n_distinct(ID), that's why I am looking for a method to count rows with NA values in all columns except in this one.

My dataframe looks like this sample:

ID neckpain backpain kneepain
1  Yes       NA       NA
2  NA        NA       NA
3  Yes       Yes      Yes
2  NA        NA       NA
3  Yes       Yes      Yes
4  NA        NA       NA

The outcome I'm trying to obtain would be nrows: 3

Thanks in advance

Upvotes: 0

Views: 914

Answers (4)

Sotos
Sotos

Reputation: 51582

Assuming that ID is your first column, then

sum(rowSums(is.na(df[-1])) == ncol(df[-1]))
#[1] 3

If you want to look at it from the opposite direction (i.e. 0 columns with non-NA), then you can use suggestion by @RonakShah,

sum(rowSums(!is.na(df[-1])) == 0)

Upvotes: 2

Darren Tsai
Darren Tsai

Reputation: 35554

For a supplement, it's a dplyr solution.

library(dplyr)

df %>% filter(across(-ID, is.na)) %>% count()

#   n
# 1 3

Upvotes: 2

Matt Dzievit
Matt Dzievit

Reputation: 527

Keeping in the tidyverse world (assumed since you wanted to use n_distinct)

library(tidyverse)

##Your data
data <- tibble(ID = c(1,2,3,2,3,4),
               neckpain = c('Yes',NA,'Yes',NA,'Yes',NA),
               backpain = c(NA,NA,'Yes',NA,'Yes',NA),
               kneepain = c(NA,NA,'Yes',NA,'Yes',NA)) 

##Pull out ones are missing across ID and count the rows if you want to cherry pick columns
nrow(data %>% 
  rowwise() %>% 
  mutate(row_total = sum(is.na(neckpain),
                         is.na(backpain),
                         is.na(kneepain))) %>% 
  filter(row_total == 3))

[1] 3

##Or if you just want to do it across all rows as noted in the comments
nrow(data %>% 
       mutate(row_total = rowSums(is.na(.[2:4]))) %>% 
       filter(row_total == 3))
[1] 3

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76402

Here is a one-liner:

sum(apply(df1[-1], 1, function(x) all(is.na(x))))
#[1] 3

Data

df1 <- read.table(text = "
ID neckpain backpain kneepain
1  Yes       NA       NA
2  NA        NA       NA
3  Yes       Yes      Yes
2  NA        NA       NA
3  Yes       Yes      Yes
4  NA        NA       NA
", header = TRUE)

Upvotes: 1

Related Questions