Obtaining basic statistics on multiple variables and multiple groups

Question

I want to calculate a 2 basic statistics for my data below on the 2 variables y1 and y2.

First, for each group, I want to separately obtain variance*n_of_group-1 (e.g., for group==1 the answer will be 6 on y1 and 2 on y2).

Second, for each group, I want to separately obtain covariance*n_of_group-1 (e.g., for group==1 the answer will be 0).

I have tried something, but I wonder how to apply the *n_of_group-1 part to my R code below?

ps. n_of_group simply is the count() or n() of each group. My desired output is shown below.

z <- "group    y1    y2
1 1         2     3
2 1         3     4
3 1         5     4
4 1         2     5
5 2         4     8
6 2         5     6
7 2         6     7
8 3         7     6
9 3         8     7
10 3        10     8
11 3         9     5
12 3         7     6"

dat <- read.table(text = z, header = T)

dat %>%
  group_by(group) %>%
  summarise(var1 = var(y1), var2 = var(y2)) # how to apply the `*n_of_group-1` to var1 & var2

dat %>%
  group_by(group) %>%
  summarise(co = cov(y1,y2)) # how to apply the `*n_of_group-1` to co, what if `co` was more than 1 number

Desired output (if we put the results above for each group in a 2x2 matrix):

group1 = matrix(c(6,0,0,2),2)   # The two repetitive element in the middle (0,0) are 
                                # the second statistic, the other elements are the 
                                # first statistics
group2 = matrix(c(2,-1,-1,2),2)
group3 = matrix(c(6.8,2.6,2.6,5.2),2)

akrun · Accepted Answer

We can also use across

library(dplyr)
dat %>% 
    group_by(group) %>%
    summarise(co = cov(y1, y2) * (n() - 1), 
       across(c(y1, y2), ~ var(.) * (n() - 1), 
             .names = 'var_{.col}'), .groups = 'drop')

-output

# A tibble: 3 x 4
#  group    co var_y1 var_y2
#       
#1     1   0      6      2  
#2     2  -1      2      2  
#3     3   2.6    6.8    5.2

In addition, it may be better to create the n first

library(tibble)
dat %>% 
   add_count(group) %>%
   group_by(group) %>%
   summarise(co = cov(y1, y2) * (first(n) - 1), 
   across(c(y1, y2), ~ var(.) * (first(n)- 1), 
             .names = 'var_{.col}'), .groups = 'drop')

Obtaining basic statistics on multiple variables and multiple groups

Answers (2)

Related Questions