Reputation: 73

Correlation by Group

There are some other threads about this already. I want to implement the following suggested solution.

As an example data set:

data(Leinhart, package = "carData")
dat <- tibble::rowid_to_column(Leinhardt, var = "ID")
dat$income <- as.numeric(dat$income)
head(dat)

 ID income infant   region oil
  1   3426   26.7     Asia  no
  2   3350   23.7   Europe  no
  3   3346   17.0   Europe  no
  4   4751   16.8 Americas  no
  5   5029   13.5   Europe  no
  6   3312   10.1   Europe  no

This is the solution, I think, suggested in other posts and my error. Why is this happening?

library(tidyverse)
library(broom)

dat  %>% 
  group_by(region) %>%
  summarize(correlation = cor(infant, income, method = "sp"))

Fehler in summarize(., correlation = cor(infant, income, method = "sp")) : 
  Argument "by" fehlt (ohne Standardwert)

R-Version: "R version 4.0.4 (2021-02-15)" Dplyr Version: "1.0.4."

(I have posted this in another question before which I deleted, because there were two separate questions which caused confusion.)

Thank you.

Upvotes: 0

Answers (2)

Indrajeet Patil

Reputation: 4889

Another option is to use correlation package that supports many more correlation methods than cor function and can work with grouped dataframes from dplyr:

library(correlation)
library(carData)
library(magrittr)

dat <- tibble::rowid_to_column(carData::Leinhardt, var = "ID")
dat$income <- as.numeric(dat$income)

library(tidyverse)

dat  %>% 
  group_by(region) %>%
  correlation(method = "spearman")
#> # Correlation table (spearman-method)
#> 
#> Group    | Parameter1 | Parameter2 |   rho |         95% CI |        S |         p
#> ----------------------------------------------------------------------------------
#> Africa   |         ID |     income | -0.61 | [-0.79, -0.33] | 10509.91 | < .001***
#> Africa   |         ID |     infant |  0.19 | [-0.17,  0.50] |  5294.81 | 0.558    
#> Africa   |     income |     infant | -0.13 | [-0.46,  0.23] |  7391.82 | 0.558    
#> Americas |         ID |     income | -0.53 | [-0.78, -0.14] |  3096.00 | 0.020*   
#> Americas |         ID |     infant | -0.14 | [-0.54,  0.31] |  2019.07 | 0.534    
#> Americas |     income |     infant | -0.56 | [-0.80, -0.17] |  2761.28 | 0.020*   
#> Asia     |         ID |     income | -0.81 | [-0.91, -0.64] |  8158.41 | < .001***
#> Asia     |         ID |     infant |  0.41 | [ 0.02,  0.69] |  1939.59 | 0.035*   
#> Asia     |     income |     infant | -0.58 | [-0.79, -0.25] |  5179.87 | 0.003**  
#> Europe   |         ID |     income | -0.54 | [-0.81, -0.08] |  1488.00 | 0.044*   
#> Europe   |         ID |     infant |  0.14 | [-0.36,  0.58] |   830.00 | 0.570    
#> Europe   |     income |     infant | -0.62 | [-0.85, -0.21] |  1574.00 | 0.017*   
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 18-34

^{Created on 2021-02-23 by the reprex package (v1.0.0)}

Upvotes: 3

TarJae

Reputation: 79286

This code is working on my machine:

library(carData)
df <- Leinhardt


df  %>% 
  group_by(region) %>%
  summarize(correlation = cor(infant, income, method = "sp"))

# output
# A tibble: 4 x 2
  region   correlation
  <fct>          <dbl>
1 Africa        -0.129
2 Americas      NA    
3 Asia          NA    
4 Europe        -0.624

# try this code with your machine:

library(ggcorrplot)
model.matrix(~0+., data=df) %>% 
  cor(use="pairwise.complete.obs") %>% 
  ggcorrplot(show.diag = F, type="lower", lab=TRUE, lab_size=2)

should result in correlation matrix plot like:

Upvotes: 2

Correlation by Group

Answers (2)

Related Questions