windy
windy

Reputation: 145

Correlate by levels of a variable in R

I would like to correlate two variables and have the output reported separately for levels of a third variable.

My data are similar to this example:

var1 <- c(7, 8, 9, 10, 11, 12)
var2 <- c(18, 17, 16, 15, 14, 13)
categories <- c(1, 2, 3, 1, 2, 3)

And I want to correlate var1 with var2 within the categories, such that the results would show the correlation of the values of var1 and var2 for category 1 separately from category 2 and category 3.

In SAS, I would do:

PROC CORR DATA=x; 
  BY CATEGORY
  VAR VAR1
  WITH VAR2; 
RUN;

Upvotes: 2

Views: 1256

Answers (2)

akrun
akrun

Reputation: 887118

You could also use by

sapply(by(cbind(var1, var2), categories, FUN=cor),`[`,2)
#1  2  3 
#-1 -1 -1 

Upvotes: 0

MrFlick
MrFlick

Reputation: 206232

You can put your records into a data.frame and then split by the cateogies and then run the correlation for each of the categories.

sapply(
    split(data.frame(var1, var2), categories), 
    function(x) cor(x[[1]],x[[2]])
)

This can look prettier with the dplyr library

library(dplyr)
data.frame(var1=var1, var2=var2, categories=categories) %>%
    group_by(categories) %>%
    summarize(cor= cor(var1, var2))

Upvotes: 1

Related Questions