Reputation: 29
I have a dataframe with 6 lines and 14 columns. I calculate the Pearson correlation through:
#read data
data1 <- read.csv("test.csv")
#calculate correlation
pearson_coef <- cor(data1[sapply(data1, is.numeric)])
And I get the correct correlation coefficients. Now I would like to get the significance level of the correlation. So I used:
significance <- cor.test(data1)
But I get this error:
Error in cor.test.default(data1) :
argument "y" is missing, with no default
I don't understand what is the problem. Could you help me?
Moreover, I would like to know if it is possible to get an output (a unique dataframe) with the Pearson correlation coefficients and the associate significance levels?
Sorry for the question!
Upvotes: 0
Views: 2234
Reputation: 20463
stats::cor.test
takes two inputs, x
and y
, which are numeric vectors of the same length -- see the documentation, ?cor.test
-- the long and short, you cannot feed cor.test
a data.frame
.
To get the desired behavior you are after you could use the psych
package and its corr.test()
function -- try the following:
# install.packages("psych")
library(psych)
corr.test(data1[sapply(data1, is.numeric)])
This will return the correlation matrix, the sample size, and a matrix of p-values. Per your example, you could use the following to extract just the p-values and assign them to significance
:
significance <- corr.test(data1[sapply(data1, is.numeric)])$p
psych::corr.test
:
"For symmetric matrices, raw probabilites are reported below the diagonal and correlations adjusted for multiple comparisons above the diagonal. In the case of different x and ys, the default is to adjust the probabilities for multiple tests."
You can turn off the p-value adjustment by using adjust = "none"
like so...
corr.test(data1[sapply(data1, is.numeric)], adjust = "none")
...however, you should use caution when interpreting such results.
For more information on adjusted p-values, see ?p.adjust
.
Upvotes: 1
Reputation: 4406
Without a reproducible example this is a bit onerous. I'll use the iris dataset for your example. The reason that your cor.test()
call isn't working is because you need to specify which variables you'd like to compare. Look at ?cor.test
for more details.
But if you want to get at this issue with a dataframe I'd recommend the dplyr
package. First you need to understand how to extract values from a cor.test()
object. Run the correlation:
iris.cor <- cor.test(iris$Petal.Width, iris$Petal.Length)
Then look at the structure of the dataframe:
str(iris.cor)
Notice that you can extract values from the cor.test() object like this:
iris.cor$estimate
iris.cor$p.value
Now if you want to include these values in a dataframe there are several ways to do it. My preference would be a dplyr
solution but there are many ways to skin this cat.
library(dplyr)
iris %>%
summarise(coef=cor.test(Petal.Width, Petal.Length)$estimate,
pval=cor.test(Petal.Width, Petal.Length)$p.value)
This has the advantage of adding other elements of the cor.test()
object easily or adding a grouping variable:
iris %>%
group_by(Species) %>%
summarise(coef=cor.test(Petal.Width, Petal.Length)$estimate,
pval=cor.test(Petal.Width, Petal.Length)$p.value)
At the end of the day R always several ways of doing things. Here is one.
Upvotes: 0
Reputation: 17412
cor.test
has a very specific input method. Let's say you have two variables in your data x
and y
:
cor.test(~ x + y, data)
will get you what you want.
Upvotes: 1