Reputation: 449
I have a dataframe df. I need to find the correlation between ColE and ColF among the groups.
df = structure(list(ColA = c("A", "A", "A", "B", "B"), ColB = c("L",
"L", "L", "L", "K"), ColC = c("Sup1", "Sup1", "Sup2", "Sup1",
"Sup1"), ColD = c("Jan", "Feb", "Mar", "Apr", "May"), ColE = c(56,
59, 68, 45, 45), ColF = c(58, 60, 90, 65, 59)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
ColA ColB ColC ColD ColE ColF
A L Sup1 Jan 56 58
A L Sup1 Feb 59 60
A L Sup2 Mar 68 90
B L Sup1 Apr 45 65
B K Sup1 May 45 59
Here for groups between ColA,ColB, I need to find the correlation so the output should be like
New ColA New ColB Correlation coeff
A L ---
B L ---
B K ---
Similarly if I need to find the cor coeff for among other groups like
New ColA New ColB New ColC Correlation coeff
A L Sup1 ---
A L Sup2 ---
B L Sup1 ---
B K Sup1 ---
IS there a way to solve this?
Upvotes: 1
Views: 43
Reputation: 10375
With data.table
package
> data.table(df)[,j=list(kor=cor(ColE,ColF)),by=list(ColA,ColB)]
ColA ColB kor
1: A L 0.982613
2: B L NA
3: B K NA
Upvotes: 1
Reputation: 39858
With dplyr
, you can do:
df %>%
group_by(ColA, ColB) %>%
summarise(corr_coeff = cor(ColE, ColF))
ColA ColB corr_coeff
<chr> <chr> <dbl>
1 A L 0.983
2 B K NA
3 B L NA
Note that for two groups no coefficient is calculated as they have only a single value.
Upvotes: 0