Reputation: 15
I have a dataframe
with 502543 obs. of 9 variables including the ID (which is repeated several times). I need to count how many rows have NA
values in all variables except in ID
. I don't want to delete this ID column, as later I will need to count n_distinct(ID)
, that's why I am looking for a method to count rows with NA values in all columns except in this one.
My dataframe looks like this sample:
ID neckpain backpain kneepain
1 Yes NA NA
2 NA NA NA
3 Yes Yes Yes
2 NA NA NA
3 Yes Yes Yes
4 NA NA NA
The outcome I'm trying to obtain would be nrows: 3
Thanks in advance
Upvotes: 0
Views: 914
Reputation: 51582
Assuming that ID
is your first column, then
sum(rowSums(is.na(df[-1])) == ncol(df[-1]))
#[1] 3
If you want to look at it from the opposite direction (i.e. 0 columns with non-NA), then you can use suggestion by @RonakShah,
sum(rowSums(!is.na(df[-1])) == 0)
Upvotes: 2
Reputation: 35554
For a supplement, it's a dplyr
solution.
library(dplyr)
df %>% filter(across(-ID, is.na)) %>% count()
# n
# 1 3
Upvotes: 2
Reputation: 527
Keeping in the tidyverse world (assumed since you wanted to use n_distinct)
library(tidyverse)
##Your data
data <- tibble(ID = c(1,2,3,2,3,4),
neckpain = c('Yes',NA,'Yes',NA,'Yes',NA),
backpain = c(NA,NA,'Yes',NA,'Yes',NA),
kneepain = c(NA,NA,'Yes',NA,'Yes',NA))
##Pull out ones are missing across ID and count the rows if you want to cherry pick columns
nrow(data %>%
rowwise() %>%
mutate(row_total = sum(is.na(neckpain),
is.na(backpain),
is.na(kneepain))) %>%
filter(row_total == 3))
[1] 3
##Or if you just want to do it across all rows as noted in the comments
nrow(data %>%
mutate(row_total = rowSums(is.na(.[2:4]))) %>%
filter(row_total == 3))
[1] 3
Upvotes: 1
Reputation: 76402
Here is a one-liner:
sum(apply(df1[-1], 1, function(x) all(is.na(x))))
#[1] 3
Data
df1 <- read.table(text = "
ID neckpain backpain kneepain
1 Yes NA NA
2 NA NA NA
3 Yes Yes Yes
2 NA NA NA
3 Yes Yes Yes
4 NA NA NA
", header = TRUE)
Upvotes: 1