LCricket
LCricket

Reputation: 3

How can correlate against multiple columns using ddply?

I have a data.frame and I want to calculate correlation coefficients using one column against the other columns (there are some non-numeric columns in the frame as well).

ddply(Banks,.(brand_id,standard.quarter),function(x) { cor(BLY11,x) })
# Error in cor(BLY11, x) : 'y' must be numeric

I tested against is.numeric(x)

ddply(Banks,.(brand_id,standard.quarter),function(x) { if is.numeric(x) cor(BLY11,x) else 0 })

but that failed every comparison and returned 0 and returned only one column, as if its only being called once. What is being passed to the function? Just coming to R and I think there's something fundamental I'm missing.

Thanks

Upvotes: 0

Views: 4155

Answers (5)

Josh O'Brien
Josh O'Brien

Reputation: 162321

From ?cor:

If ‘x’ and ‘y’ are matrices then the covariances (or correlations) between the columns of ‘x’ and the columns of ‘y’ are computed.

So your only real job is to remove the non-numeric columns:

# An example data.frame containing a non-numeric column
d <- cbind(fac=c("A","B"), mtcars)

## Calculate correlations between the mpg column and all numeric columns
cor(d$mpg, as.matrix(d[sapply(d, is.numeric)]))
     mpg       cyl       disp         hp      drat         wt     qsec
[1,]   1 -0.852162 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.418684
            vs        am      gear       carb
[1,] 0.6640389 0.5998324 0.4802848 -0.5509251

Edit: And in fact, as @MYaseen208's answer shows, there's no need to explicitly convert data.frames to matrices. Both of the following work just fine:

cor(d$mpg, d[sapply(d, is.numeric)])

cor(mtcars, mtcars)

Upvotes: 5

MYaseen208
MYaseen208

Reputation: 23898

Try something like this one

cor(longley[, 1], longley[ , sapply(longley, is.numeric)])



    GNP.deflator       GNP Unemployed Armed.Forces Population      Year  Employed
[1,]            1 0.9915892  0.6206334    0.4647442  0.9791634 0.9911492 0.9708985

Upvotes: 5

Justin
Justin

Reputation: 43255

ddply splits a data.frame into chunks and sends them (smaller data.frames) to your function. your x is a data.frame with the same columns as Banks. Thus, is.numeric(x) is FALSE. is.data.frame(x) should return TRUE.

try:

function(x) { 
  cor(x$BLY11, x$othercolumnname) 
}

Upvotes: 2

Paul Hiemstra
Paul Hiemstra

Reputation: 60924

This function operates on a chunk:

calc_cor_only_numeric = function(chunk) {
   is_numeric = sapply(chunk, is.numeric)
   return(cor(chunk[-is_numeric]))
 }

And can be used by ddply:

ddply(banks, .(cat1, cat2), calc_cor_only_numeric)

I could not check the code, but this should get you started.

Upvotes: 2

Blue Magister
Blue Magister

Reputation: 13363

It looks like what you're doing can be done with sapply as well:

with(Banks,
  sapply( list(brand_id,standard.quarter), function(x) cor(BLY11,x) )
)

Upvotes: 1

Related Questions