Brigadeiro
Brigadeiro

Reputation: 2945

Count non-`NA` of several columns by group using summarize and across from dplyr

I want to use summarize and across from dplyrto count the number of non-NA values by my grouping variable. For example, using these data:

library(tidyverse)  
d <- tibble(ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
            Col1 = c(5, 8, 2, NA, 2, 2, NA, NA, 1),
            Col2 = c(NA, 2, 1, NA, NA, NA, 1, NA, NA),
            Col3 = c(1, 5, 2, 4, 1, NA, NA, NA, NA))  
# A tibble: 9 x 4
     ID  Col1  Col2  Col3
  <dbl> <dbl> <dbl> <dbl>
1     1     5    NA     1
2     1     8     2     5
3     1     2     1     2
4     2    NA    NA     4
5     2     2    NA     1
6     2     2    NA    NA
7     3    NA     1    NA
8     3    NA    NA    NA
9     3     1    NA    NA

With a solution resembling:

d %>%
  group_by(ID) %>%
  summarize(across(matches("^Col[1-3]$"),
                   #function to count non-NA per column per ID
                   ))

With the following result:

# A tibble: 3 x 4
     ID  Col1  Col2  Col3
  <dbl> <dbl> <dbl> <dbl>
1     1     3     2     3
2     2     2     0     2
3     3     1     1     0

Upvotes: 7

Views: 1688

Answers (1)

Anoushiravan R
Anoushiravan R

Reputation: 21938

I hope this is what you are looking for:

library(dplyr)

d %>%
  group_by(ID) %>%
  summarise(across(Col1:Col3, ~ sum(!is.na(.x)), .names = "non-{.col}"))

# A tibble: 3 x 4
     ID `non-Col1` `non-Col2` `non-Col3`
  <dbl>      <int>      <int>      <int>
1     1          3          2          3
2     2          2          0          2
3     3          1          1          0

Or if you would like to select columns by their shared string you can use this:

d %>%
  group_by(ID) %>%
  summarise(across(contains("Col"), ~ sum(!is.na(.x)), .names = "non-{.col}"))

Upvotes: 9

Related Questions