Reputation: 73
There are some other threads about this already. I want to implement the following suggested solution.
As an example data set:
data(Leinhart, package = "carData")
dat <- tibble::rowid_to_column(Leinhardt, var = "ID")
dat$income <- as.numeric(dat$income)
head(dat)
ID income infant region oil
1 3426 26.7 Asia no
2 3350 23.7 Europe no
3 3346 17.0 Europe no
4 4751 16.8 Americas no
5 5029 13.5 Europe no
6 3312 10.1 Europe no
This is the solution, I think, suggested in other posts and my error. Why is this happening?
library(tidyverse)
library(broom)
dat %>%
group_by(region) %>%
summarize(correlation = cor(infant, income, method = "sp"))
Fehler in summarize(., correlation = cor(infant, income, method = "sp")) :
Argument "by" fehlt (ohne Standardwert)
R-Version: "R version 4.0.4 (2021-02-15)" Dplyr Version: "1.0.4."
(I have posted this in another question before which I deleted, because there were two separate questions which caused confusion.)
Thank you.
Upvotes: 0
Views: 1276
Reputation: 4889
Another option is to use correlation
package that supports many more correlation methods than cor
function and can work with grouped
dataframes from dplyr
:
library(correlation)
library(carData)
library(magrittr)
dat <- tibble::rowid_to_column(carData::Leinhardt, var = "ID")
dat$income <- as.numeric(dat$income)
library(tidyverse)
dat %>%
group_by(region) %>%
correlation(method = "spearman")
#> # Correlation table (spearman-method)
#>
#> Group | Parameter1 | Parameter2 | rho | 95% CI | S | p
#> ----------------------------------------------------------------------------------
#> Africa | ID | income | -0.61 | [-0.79, -0.33] | 10509.91 | < .001***
#> Africa | ID | infant | 0.19 | [-0.17, 0.50] | 5294.81 | 0.558
#> Africa | income | infant | -0.13 | [-0.46, 0.23] | 7391.82 | 0.558
#> Americas | ID | income | -0.53 | [-0.78, -0.14] | 3096.00 | 0.020*
#> Americas | ID | infant | -0.14 | [-0.54, 0.31] | 2019.07 | 0.534
#> Americas | income | infant | -0.56 | [-0.80, -0.17] | 2761.28 | 0.020*
#> Asia | ID | income | -0.81 | [-0.91, -0.64] | 8158.41 | < .001***
#> Asia | ID | infant | 0.41 | [ 0.02, 0.69] | 1939.59 | 0.035*
#> Asia | income | infant | -0.58 | [-0.79, -0.25] | 5179.87 | 0.003**
#> Europe | ID | income | -0.54 | [-0.81, -0.08] | 1488.00 | 0.044*
#> Europe | ID | infant | 0.14 | [-0.36, 0.58] | 830.00 | 0.570
#> Europe | income | infant | -0.62 | [-0.85, -0.21] | 1574.00 | 0.017*
#>
#> p-value adjustment method: Holm (1979)
#> Observations: 18-34
Created on 2021-02-23 by the reprex package (v1.0.0)
Upvotes: 3
Reputation: 79286
This code is working on my machine:
library(carData)
df <- Leinhardt
df %>%
group_by(region) %>%
summarize(correlation = cor(infant, income, method = "sp"))
# output
# A tibble: 4 x 2
region correlation
<fct> <dbl>
1 Africa -0.129
2 Americas NA
3 Asia NA
4 Europe -0.624
# try this code with your machine:
library(ggcorrplot)
model.matrix(~0+., data=df) %>%
cor(use="pairwise.complete.obs") %>%
ggcorrplot(show.diag = F, type="lower", lab=TRUE, lab_size=2)
should result in correlation matrix plot like:
Upvotes: 2