user13069688
user13069688

Reputation: 353

How to calculate the sum of all columns based on a grouped variable and remove NA

I am having a dataset where I would like to group by the ID variable and then calculate the sum of each column / variable. However, I am having some NA as you can see and I would like to remove them while the sum function is being excecuted as they return NA in some rows although some rows of the same ID contain values. I have tried to look around with no success and I have tried different methods again with no success. I would appreciate any help.

Thank you in advance.

data <- data.frame(ID = c(1, 1, 2, 2, 3, 3, 3, 4, 4, 4),
              var1 = c(1, 2, 5, 10, NA, 5, 23, NA, NA, 1),
              var2 = c(1, NA, NA, 1, NA, 0, 1, 3, 23, 4))


data <- data %>%
group_by(ID) %>%
summarise(across(everything(), sum(., na.rm = T)))

Upvotes: 1

Views: 1178

Answers (2)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

Just the tilde ~ is missing:

data %>%
  group_by(ID) %>%
  summarise(across(everything(), ~sum(., na.rm = T)))
# A tibble: 4 x 3
     ID  var1  var2
* <dbl> <dbl> <dbl>
1     1     3     1
2     2    15     1
3     3    28     1
4     4     1    30

In case one ID group has only NA values you can do this:

data %>%
  group_by(ID) %>%
  summarise(across(everything(), ~ifelse(all(is.na(.)), NA, sum(., na.rm = T))))

Upvotes: 2

akrun
akrun

Reputation: 887118

We may specify the arguments of the function without using lambda function

library(dplyr)
data %>%
     group_by(ID) %>%
     summarise(across(everything(), sum, na.rm = TRUE), .groups = 'drop')

Upvotes: 1

Related Questions