David Potrel
David Potrel

Reputation: 111

Use mutate with column names in R

I am trying to sum the NA values in between variables in my dataset. I can do it using mutate in dplyr but somehow I've only managed to do it using the column numbers and not the column names which is quite annoying because I have a rather large number of variables and I might drop some of them.

Here is the code that I have:

    master_clean <- master_clean %>%
       mutate(nbNA_pt1 =rowSums(is.na(master_clean[,c(1:3)])),
         nbNA_pt2 = rowSums(is.na(master_clean[,c(4:6)])),
         nbNA_pt3 = rowSums(is.na(master_clean[,c(7:9)]))
         )

And here is what I would like to have:

    master_clean <- master_clean %>%
         mutate(nbNA_pt1 = rowSums(is.na(master_clean[,c("Q1":"Q12")])),
         nbNA_pt2 = rowSums(is.na(master_clean[,c("Q13":"Q20")])),
         nbNA_pt3 = rowSums(is.na(master_clean[,c("Q21":"Q90")]))
         )

Is there a way I can do that?

Thanks!

Upvotes: 0

Views: 472

Answers (1)

TimTeaFan
TimTeaFan

Reputation: 18581

One option is, as @Martin Gal mentioned in the comments already, to use dplyr::across:

master_clean <- master_clean %>%
  mutate(nbNA_pt1 = rowSums(is.na(across(c(Q1:Q12)))),
         nbNA_pt2 = rowSums(is.na(across(c(Q13:Q20)))),
         nbNA_pt3 = rowSums(is.na(across(c(Q21:Q90))))
  )

The other option is to use rowwise and then you can replace rowSums with sum and use c_across instead of across. This approach is less performant and more usful in cases where we don't have vectorised functions that work on rows such as rowSums.

master_clean <- master_clean %>%
  rowwise %>% 
  mutate(nbNA_pt1 = sum(is.na(c_across(c(Q1:Q12)))),
         nbNA_pt2 = sum(is.na(c_across(c(Q13:Q20)))),
         nbNA_pt3 = sum(is.na(c_across(c(Q21:Q90))))
  )

Upvotes: 1

Related Questions