user3768495
user3768495

Reputation: 4667

R apply correlation function to a list

I have a data frame like this:

set.seed(1)
category <- c(rep('A',100), rep('B',100), rep('C',100))
var1 = rnorm(1:300)
var2 = rnorm(1:300)
df<-data.frame(category=category, var1 = var1, var2=var2)

I need to calculate the correlations between var1 and var2 by category. I think I can first split the df by category and apply the cor function to the list. But I am really confused about hot to use the lapply function. Could someone kindly help me out?

Upvotes: 0

Views: 4592

Answers (3)

MrFlick
MrFlick

Reputation: 206566

And just for comparison, here's how you'd do it with the dplyr package.

library(dplyr)
df %>% group_by(category) %>% summarize(cor=cor(var1,var2))

#   category         cor
# 1        A -0.05043706
# 2        B  0.13519013
# 3        C -0.04186283

Upvotes: 1

B.Shankar
B.Shankar

Reputation: 1281

This should produce the desired result:

lapply(split(df, category), function(dfs) cor(dfs$var1, dfs$var2))

EDIT:

You can also use by (as suggested by @thelatemail):

by(df, df$category, function(x) cor(x$var1,x$var2)) 

Upvotes: 2

Robert
Robert

Reputation: 5162

You can use sapply to get the same but as a vector, not a list

sapply(split(df, category), function(dfs) cor(dfs$var1, dfs$var2))

Upvotes: 1

Related Questions