user14375619
user14375619

Reputation: 39

Count number of observations by group

I'm trying to count the number of every observation for each variable in a dataset regarding a specific group.

The data looks like this:

grp v1  vn 
1   2   5  
2   4      
3   3   4
1       3
1   2   12
4       5
5   3   6
5   6

The Result should be a table like this:

grp v1 vn
1   2  3
2   1  0
3   1  1
4   0  1
5   2  1

I tried to use x %>% group_by(grp) %>% summarise(across(everything(),n = n())) but it didn`t really worked.

Any help is appreciated. Thanks in advance!

Upvotes: 0

Views: 744

Answers (4)

akrun
akrun

Reputation: 887128

Using data.table

library(data.table)
setDT(df)[, lapply(.SD, function(x) sum(!is.na(x))), grp]
#   grp v1 vn
#1:   1  2  3
#2:   2  1  0
#3:   3  1  1
#4:   4  0  1
#5:   5  2  1

Upvotes: 2

jay.sf
jay.sf

Reputation: 72828

Using aggregate.

aggregate(cbind(v1, vn) ~ grp, replace(dat, is.na(dat), 0), function(x) sum(as.logical(x)))
#   grp v1 vn
# 1   1  2  3
# 2   2  1  0
# 3   3  1  1
# 4   4  0  1
# 5   5  2  1

Data:

dat <- read.table(header=T, text='grp v1  vn 
1   2   5  
2   4   NA   
3   3   4
1   NA  3
1   2   12
4   NA  5
5   3   6
5   6   NA
')

Upvotes: 0

Anoushiravan R
Anoushiravan R

Reputation: 21918

You can also use the following solution:

library(dplyr)

df %>%
  group_by(grp) %>%
  summarise(across(v1:vn, ~ sum(!is.na(.x))))

# A tibble: 5 x 3
    grp    v1    vn
  <int> <int> <int>
1     1     2     3
2     2     1     0
3     3     1     1
4     4     0     1
5     5     2     1

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

Get the data in long format, count non-NA values for each column in each group and get the data in wide format.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = -grp) %>%
  group_by(grp, name) %>%
  summarise(n = sum(!is.na(value))) %>%
  ungroup %>%
  pivot_wider(names_from = name, values_from = n)

#    grp    v1    vn
#  <int> <int> <int>
#1     1     2     3
#2     2     1     0
#3     3     1     1
#4     4     0     1
#5     5     2     1

data

df <- structure(list(grp = c(1L, 2L, 3L, 1L, 1L, 4L, 5L, 5L), v1 = c(2L, 
4L, 3L, NA, 2L, NA, 3L, 6L), vn = c(5L, NA, 4L, 3L, 2L, 5L, 6L, 
NA)), class = "data.frame", row.names = c(NA, -8L))

Upvotes: 2

Related Questions