Reputation: 4667
I have a data frame like this:
set.seed(1)
category <- c(rep('A',100), rep('B',100), rep('C',100))
var1 = rnorm(1:300)
var2 = rnorm(1:300)
df<-data.frame(category=category, var1 = var1, var2=var2)
I need to calculate the correlations between var1 and var2 by category. I think I can first split
the df
by category
and apply the cor
function to the list. But I am really confused about hot to use the lapply
function.
Could someone kindly help me out?
Upvotes: 0
Views: 4592
Reputation: 206566
And just for comparison, here's how you'd do it with the dplyr
package.
library(dplyr)
df %>% group_by(category) %>% summarize(cor=cor(var1,var2))
# category cor
# 1 A -0.05043706
# 2 B 0.13519013
# 3 C -0.04186283
Upvotes: 1
Reputation: 1281
This should produce the desired result:
lapply(split(df, category), function(dfs) cor(dfs$var1, dfs$var2))
EDIT:
You can also use by
(as suggested by @thelatemail):
by(df, df$category, function(x) cor(x$var1,x$var2))
Upvotes: 2
Reputation: 5162
You can use sapply
to get the same but as a vector, not a list
sapply(split(df, category), function(dfs) cor(dfs$var1, dfs$var2))
Upvotes: 1