Dev P
Dev P

Reputation: 449

Correlation between groups

I have a dataframe df. I need to find the correlation between ColE and ColF among the groups.

   df = structure(list(ColA = c("A", "A", "A", "B", "B"), ColB = c("L", 
   "L", "L", "L", "K"), ColC = c("Sup1", "Sup1", "Sup2", "Sup1", 
   "Sup1"), ColD = c("Jan", "Feb", "Mar", "Apr", "May"), ColE = c(56, 
   59, 68, 45, 45), ColF = c(58, 60, 90, 65, 59)), row.names = c(NA, 
   -5L), class = c("tbl_df", "tbl", "data.frame"))
   ColA    ColB      ColC      ColD      ColE       ColF
    A       L         Sup1      Jan       56         58
    A       L         Sup1      Feb       59         60
    A       L         Sup2      Mar       68         90
    B       L         Sup1      Apr       45         65
    B       K         Sup1      May       45         59

Here for groups between ColA,ColB, I need to find the correlation so the output should be like

   New ColA     New ColB       Correlation coeff
      A            L                   ---
      B            L                   ---
      B            K                   ---

Similarly if I need to find the cor coeff for among other groups like

     New ColA     New ColB      New ColC    Correlation coeff
      A            L               Sup1               ---
      A            L               Sup2               ---
      B            L               Sup1               ---   
      B            K               Sup1               --- 

IS there a way to solve this?

Upvotes: 1

Views: 43

Answers (2)

user2974951
user2974951

Reputation: 10375

With data.table package

> data.table(df)[,j=list(kor=cor(ColE,ColF)),by=list(ColA,ColB)]

   ColA ColB      kor
1:    A    L 0.982613
2:    B    L       NA
3:    B    K       NA

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 39858

With dplyr, you can do:

df %>%
 group_by(ColA, ColB) %>%
 summarise(corr_coeff = cor(ColE, ColF))

  ColA  ColB  corr_coeff
  <chr> <chr>     <dbl>
1 A     L         0.983
2 B     K        NA    
3 B     L        NA  

Note that for two groups no coefficient is calculated as they have only a single value.

Upvotes: 0

Related Questions