Reputation: 7517
I want to calculate a 2 basic statistics for my data below on the 2 variables y1
and y2
.
First, for each group
, I want to separately obtain variance*n_of_group-1
(e.g., for group==1
the answer will be 6
on y1
and 2
on y2
).
Second, for each group
, I want to separately obtain covariance*n_of_group-1
(e.g., for group==1
the answer will be 0
).
I have tried something, but I wonder how to apply the *n_of_group-1
part to my R code below?
ps. n_of_group
simply is the count()
or n()
of each group.
My desired output is shown below.
z <- "group y1 y2
1 1 2 3
2 1 3 4
3 1 5 4
4 1 2 5
5 2 4 8
6 2 5 6
7 2 6 7
8 3 7 6
9 3 8 7
10 3 10 8
11 3 9 5
12 3 7 6"
dat <- read.table(text = z, header = T)
dat %>%
group_by(group) %>%
summarise(var1 = var(y1), var2 = var(y2)) # how to apply the `*n_of_group-1` to var1 & var2
dat %>%
group_by(group) %>%
summarise(co = cov(y1,y2)) # how to apply the `*n_of_group-1` to co, what if `co` was more than 1 number
Desired output (if we put the results above for each group in a 2x2 matrix):
group1 = matrix(c(6,0,0,2),2) # The two repetitive element in the middle (0,0) are
# the second statistic, the other elements are the
# first statistics
group2 = matrix(c(2,-1,-1,2),2)
group3 = matrix(c(6.8,2.6,2.6,5.2),2)
Upvotes: 1
Views: 82
Reputation: 886938
We can also use across
library(dplyr)
dat %>%
group_by(group) %>%
summarise(co = cov(y1, y2) * (n() - 1),
across(c(y1, y2), ~ var(.) * (n() - 1),
.names = 'var_{.col}'), .groups = 'drop')
-output
# A tibble: 3 x 4
# group co var_y1 var_y2
# <int> <dbl> <dbl> <dbl>
#1 1 0 6 2
#2 2 -1 2 2
#3 3 2.6 6.8 5.2
In addition, it may be better to create the n
first
library(tibble)
dat %>%
add_count(group) %>%
group_by(group) %>%
summarise(co = cov(y1, y2) * (first(n) - 1),
across(c(y1, y2), ~ var(.) * (first(n)- 1),
.names = 'var_{.col}'), .groups = 'drop')
Upvotes: 1
Reputation: 66415
Is this what you want?
dat %>%
group_by(group) %>%
summarise(var1 = var(y1) * (n()-1),
var2 = var(y2) * (n()-1),
co = cov(y1, y2) * (n()-1))
# A tibble: 3 x 4
group var1 var2 co
* <int> <dbl> <dbl> <dbl>
1 1 6 2 0
2 2 2 2 -1
3 3 6.8 5.2 2.6
To output into separate matrices for each group:
dat %>%
group_by(group) %>%
summarise(var1 = var(y1) * (n()-1),
var2 = var(y2) * (n()-1),
co = cov(y1, y2) * (n()-1),
co2 = co) %>%
select(group, var1, co, co2, var2) -> a
split(a, a$group) -> a
lapply(a, function(x) { x["group"] <- NULL; x }) -> a
lapply(a, function(x) { matrix(x, nrow = 2, ncol = 2)})
$`1`
[,1] [,2]
[1,] 6 0
[2,] 0 2
$`2`
[,1] [,2]
[1,] 2 -1
[2,] -1 2
$`3`
[,1] [,2]
[1,] 6.8 2.6
[2,] 2.6 5.2
Upvotes: 1