Reputation: 29

Pearson correlation and significance of the correlation

I have a dataframe with 6 lines and 14 columns. I calculate the Pearson correlation through:

#read data
data1 <- read.csv("test.csv")

#calculate correlation 
pearson_coef <- cor(data1[sapply(data1, is.numeric)])

And I get the correct correlation coefficients. Now I would like to get the significance level of the correlation. So I used:

significance <- cor.test(data1)

But I get this error:

Error in cor.test.default(data1) : 
  argument "y" is missing, with no default

I don't understand what is the problem. Could you help me?

Moreover, I would like to know if it is possible to get an output (a unique dataframe) with the Pearson correlation coefficients and the associate significance levels?

Sorry for the question!

Upvotes: 0

Answers (3)

JasonAizkalns

Reputation: 20463

stats::cor.test takes two inputs, x and y, which are numeric vectors of the same length -- see the documentation, ?cor.test -- the long and short, you cannot feed cor.test a data.frame.

To get the desired behavior you are after you could use the psych package and its corr.test() function -- try the following:

# install.packages("psych")
library(psych)
corr.test(data1[sapply(data1, is.numeric)])

This will return the correlation matrix, the sample size, and a matrix of p-values. Per your example, you could use the following to extract just the p-values and assign them to significance:

significance <- corr.test(data1[sapply(data1, is.numeric)])$p

N.B. This will apply an adjustment factor to the returned p-values. Per the documentation for psych::corr.test:

"For symmetric matrices, raw probabilites are reported below the diagonal and correlations adjusted for multiple comparisons above the diagonal. In the case of different x and ys, the default is to adjust the probabilities for multiple tests."

You can turn off the p-value adjustment by using adjust = "none" like so...

corr.test(data1[sapply(data1, is.numeric)], adjust = "none")

...however, you should use caution when interpreting such results. For more information on adjusted p-values, see ?p.adjust.

Upvotes: 1

boshek

Reputation: 4406

Without a reproducible example this is a bit onerous. I'll use the iris dataset for your example. The reason that your cor.test() call isn't working is because you need to specify which variables you'd like to compare. Look at ?cor.test for more details.

But if you want to get at this issue with a dataframe I'd recommend the dplyr package. First you need to understand how to extract values from a cor.test() object. Run the correlation:

iris.cor <- cor.test(iris$Petal.Width, iris$Petal.Length)

Then look at the structure of the dataframe:

str(iris.cor)

Notice that you can extract values from the cor.test() object like this:

iris.cor$estimate
iris.cor$p.value

Now if you want to include these values in a dataframe there are several ways to do it. My preference would be a dplyr solution but there are many ways to skin this cat.

library(dplyr)

iris %>%
  summarise(coef=cor.test(Petal.Width, Petal.Length)$estimate,
            pval=cor.test(Petal.Width, Petal.Length)$p.value)

This has the advantage of adding other elements of the cor.test() object easily or adding a grouping variable:

iris %>%
  group_by(Species) %>%
  summarise(coef=cor.test(Petal.Width, Petal.Length)$estimate,
            pval=cor.test(Petal.Width, Petal.Length)$p.value)

At the end of the day R always several ways of doing things. Here is one.

Upvotes: 0

Señor O

Reputation: 17412

cor.test has a very specific input method. Let's say you have two variables in your data x and y:

cor.test(~ x + y, data)

will get you what you want.

Upvotes: 1

Pearson correlation and significance of the correlation

Answers (3)

Related Questions