Reputation: 111
I am trying to sum the NA values in between variables in my dataset. I can do it using mutate in dplyr but somehow I've only managed to do it using the column numbers and not the column names which is quite annoying because I have a rather large number of variables and I might drop some of them.
Here is the code that I have:
master_clean <- master_clean %>%
mutate(nbNA_pt1 =rowSums(is.na(master_clean[,c(1:3)])),
nbNA_pt2 = rowSums(is.na(master_clean[,c(4:6)])),
nbNA_pt3 = rowSums(is.na(master_clean[,c(7:9)]))
)
And here is what I would like to have:
master_clean <- master_clean %>%
mutate(nbNA_pt1 = rowSums(is.na(master_clean[,c("Q1":"Q12")])),
nbNA_pt2 = rowSums(is.na(master_clean[,c("Q13":"Q20")])),
nbNA_pt3 = rowSums(is.na(master_clean[,c("Q21":"Q90")]))
)
Is there a way I can do that?
Thanks!
Upvotes: 0
Views: 472
Reputation: 18581
One option is, as @Martin Gal mentioned in the comments already, to use dplyr::across
:
master_clean <- master_clean %>%
mutate(nbNA_pt1 = rowSums(is.na(across(c(Q1:Q12)))),
nbNA_pt2 = rowSums(is.na(across(c(Q13:Q20)))),
nbNA_pt3 = rowSums(is.na(across(c(Q21:Q90))))
)
The other option is to use rowwise
and then you can replace rowSums
with sum
and use c_across
instead of across
. This approach is less performant and more usful in cases where we don't have vectorised functions that work on rows such as rowSums
.
master_clean <- master_clean %>%
rowwise %>%
mutate(nbNA_pt1 = sum(is.na(c_across(c(Q1:Q12)))),
nbNA_pt2 = sum(is.na(c_across(c(Q13:Q20)))),
nbNA_pt3 = sum(is.na(c_across(c(Q21:Q90))))
)
Upvotes: 1