Reputation: 3
I have a data.frame and I want to calculate correlation coefficients using one column against the other columns (there are some non-numeric columns in the frame as well).
ddply(Banks,.(brand_id,standard.quarter),function(x) { cor(BLY11,x) })
# Error in cor(BLY11, x) : 'y' must be numeric
I tested against is.numeric(x)
ddply(Banks,.(brand_id,standard.quarter),function(x) { if is.numeric(x) cor(BLY11,x) else 0 })
but that failed every comparison and returned 0 and returned only one column, as if its only being called once. What is being passed to the function? Just coming to R and I think there's something fundamental I'm missing.
Thanks
Upvotes: 0
Views: 4155
Reputation: 162321
From ?cor:
If ‘x’ and ‘y’ are matrices then the covariances (or correlations) between the columns of ‘x’ and the columns of ‘y’ are computed.
So your only real job is to remove the non-numeric columns:
# An example data.frame containing a non-numeric column
d <- cbind(fac=c("A","B"), mtcars)
## Calculate correlations between the mpg column and all numeric columns
cor(d$mpg, as.matrix(d[sapply(d, is.numeric)]))
mpg cyl disp hp drat wt qsec
[1,] 1 -0.852162 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.418684
vs am gear carb
[1,] 0.6640389 0.5998324 0.4802848 -0.5509251
Edit: And in fact, as @MYaseen208's answer shows, there's no need to explicitly convert data.frames to matrices. Both of the following work just fine:
cor(d$mpg, d[sapply(d, is.numeric)])
cor(mtcars, mtcars)
Upvotes: 5
Reputation: 23898
Try something like this one
cor(longley[, 1], longley[ , sapply(longley, is.numeric)])
GNP.deflator GNP Unemployed Armed.Forces Population Year Employed
[1,] 1 0.9915892 0.6206334 0.4647442 0.9791634 0.9911492 0.9708985
Upvotes: 5
Reputation: 43255
ddply splits a data.frame into chunks and sends them (smaller data.frames) to your function. your x
is a data.frame with the same columns as Banks
. Thus, is.numeric(x)
is FALSE
. is.data.frame(x)
should return TRUE
.
try:
function(x) {
cor(x$BLY11, x$othercolumnname)
}
Upvotes: 2
Reputation: 60924
This function operates on a chunk:
calc_cor_only_numeric = function(chunk) {
is_numeric = sapply(chunk, is.numeric)
return(cor(chunk[-is_numeric]))
}
And can be used by ddply
:
ddply(banks, .(cat1, cat2), calc_cor_only_numeric)
I could not check the code, but this should get you started.
Upvotes: 2
Reputation: 13363
It looks like what you're doing can be done with sapply
as well:
with(Banks,
sapply( list(brand_id,standard.quarter), function(x) cor(BLY11,x) )
)
Upvotes: 1