How to summarise across different types of variables with dplyr::c_across()

Question

I have data with different types of variables. Some are character, some factors, and some numeric, like below:

df <- data.frame(a = c("tt", "ss", "ss", NA), b=c(2,3,NA,1), c=c(1,2,NA, NA), d=c("tt", "ss", "ss", NA))

I'm trying to count the number of missing values per observation using c_across in dplyr However, c_across doesn't seem to be able to combine different type of values, as the error message below suggests

df %>%
  rowwise() %>%
  summarise(NAs = sum(is.na(c_across())))

Error: Problem with summarise() input NAs. x Can't combine a and b . ℹ Input NAs is sum(is.na(c_across())). ℹ The error occurred in row 1.

Indeed, if I include only numeric variables, it works.

df %>%
  rowwise() %>%
  summarise(NAs = sum(is.na(c_across(b:c))))

Same thing if I include only character variables

df %>%
  rowwise() %>%
  summarise(NAs = sum(is.na(c_across(c(a,d)))))

I could solve the issue without using c_across like below, but I have lots of variables, so it's not very practical.

df %>%
  rowwise() %>%
  summarise(NAs = is.na(a)+is.na(b)+is.na(c)+is.na(d))

I could use the traditional apply approach, like below, but I'd like to solve this using dplyr.

apply(df, 1, function(x)sum(is.na(x)))

Any suggestions as to how to compute the number of missing values, row-wise, efficiently, and using dplyr?

akrun · Accepted Answer

A much faster option is not to use rowwise or c_across, but with rowSums

library(dplyr)
df %>% 
     mutate(NAs = rowSums(is.na(.)))
#     a  b  c    d NAs
#1   tt  2  1   tt   0
#2   ss  3  2   ss   0
#3   ss NA NA   ss   2
#4   1 NA    3

If we want to select certain columns i.e. numeric

df %>%
   mutate(NAs = rowSums(is.na(select(., where(is.numeric)))))
#     a  b  c    d NAs
#1   tt  2  1   tt   0
#2   ss  3  2   ss   0
#3   ss NA NA   ss   2
#4   1 NA    1

How to summarise across different types of variables with dplyr::c_across()

Answers (2)

Related Questions