Reputation: 2945
I want to use summarize
and across
from dplyr
to count the number of non-NA
values by my grouping variable. For example, using these data:
library(tidyverse)
d <- tibble(ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
Col1 = c(5, 8, 2, NA, 2, 2, NA, NA, 1),
Col2 = c(NA, 2, 1, NA, NA, NA, 1, NA, NA),
Col3 = c(1, 5, 2, 4, 1, NA, NA, NA, NA))
# A tibble: 9 x 4
ID Col1 Col2 Col3
<dbl> <dbl> <dbl> <dbl>
1 1 5 NA 1
2 1 8 2 5
3 1 2 1 2
4 2 NA NA 4
5 2 2 NA 1
6 2 2 NA NA
7 3 NA 1 NA
8 3 NA NA NA
9 3 1 NA NA
With a solution resembling:
d %>%
group_by(ID) %>%
summarize(across(matches("^Col[1-3]$"),
#function to count non-NA per column per ID
))
With the following result:
# A tibble: 3 x 4
ID Col1 Col2 Col3
<dbl> <dbl> <dbl> <dbl>
1 1 3 2 3
2 2 2 0 2
3 3 1 1 0
Upvotes: 7
Views: 1688
Reputation: 21938
I hope this is what you are looking for:
library(dplyr)
d %>%
group_by(ID) %>%
summarise(across(Col1:Col3, ~ sum(!is.na(.x)), .names = "non-{.col}"))
# A tibble: 3 x 4
ID `non-Col1` `non-Col2` `non-Col3`
<dbl> <int> <int> <int>
1 1 3 2 3
2 2 2 0 2
3 3 1 1 0
Or if you would like to select columns by their shared string you can use this:
d %>%
group_by(ID) %>%
summarise(across(contains("Col"), ~ sum(!is.na(.x)), .names = "non-{.col}"))
Upvotes: 9