Reputation: 39
I have a data frame that contains factor columns and a date column. I am looking to figure out the most efficient way to calculate correlation for each factor pair for each date. Here is an example date frame that I'm working with.
structure(list(MktDate = structure(c(17865, 17865, 17865, 17896,
17896, 17896, 17927, 17927, 17927), class = "Date"), Var1 = c(1,
2, 3, 1, 2, 3, 1, 2, 3), Var2 = c(3, 5, 2, 4, 3, 2, 1, 2, 5),
Var3 = c(8, 7, 6, 9, 8, 9, 5, 8, 7)), class = "data.frame", row.names = c(NA,
-9L))
I'd like a date frame to be the result in a format similar to what is displayed below
MktDate,FactorPair,Correl
2018-11-30,Var1Var2,-.32733
2018-11-30,Var1Var3,-1
2018-11-30,Var2Var3,.3273
2018-12-31,Var1Var2,-1
...
I'm guessing that this can be easily accomplished using some form of dplyr and apply but I'm not certain how to do it without using a bunch of nested loops.
I appreciate your help in advance.
Upvotes: 0
Views: 1085
Reputation: 2835
Here is a more generalized solution that computes the correlation for pairs of n
columns.
library(reshape2)
library(dplyr)
#Original Data
df_og = data.frame(MktDate = structure(c(17865, 17865, 17865, 17896,
17896, 17896, 17927, 17927, 17927), class = "Date"),
Var1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
Var2 = c(3, 5, 2, 4, 3, 2, 1, 2, 5),
Var3 = c(8, 7, 6, 9, 8, 9, 5, 8, 7))
#Dataframe to store result
df_result = data.frame(MktDate = unique(df_og$MktDate))
#Create pars of variables to eventually itterate over
combs = utils::combn(c("Var1","Var2","Var3"),2)%>%
t()
#Convert to long format data frame and store elements in a vector for each date variable pair
df = df_og %>%
melt(id.vars = "MktDate")%>%
group_by(MktDate,variable)%>%
summarise(val = list(value))%>%
ungroup()
# Itterate over each combination
for(i in seq(1,nrow(combs))){
combination = combs[i,] # Select the combination
new_col_name = paste0(combination,collapse = "") #Define the new column name
df_result = df %>%
filter(variable %in% combination)%>% #Select only the variables in this combination
dcast(MktDate~variable)%>% #Convert back into "fat" formation
group_by(MktDate)%>% #This resets the row names
mutate_(.dots = setNames(
paste0("cor(unlist(",combination[1],"),unlist(",combination[2],"))"),
new_col_name))%>% # Compute the correlation
ungroup()%>%
select_(.dots = c("MktDate",new_col_name))%>%
inner_join(df_result,by = "MktDate") #Join with the result dataframe
}
# If required convert it back into a long format
df_result = df_result%>%
melt(id.vars = "MktDate")%>%
arrange(MktDate)
The advantage of this piece of code is that it's flexible. You can add a new "Var4, Var5, Var6" and you will only need to supply the new column names in combn
. combn
calculates every pair of variables and the remaining code calculates the correlation between these pairs.
Upvotes: 2
Reputation: 903
I have created a column to pair the reps together, if they aren't all triplicates then you'll need to adjust that.
library(reshape)
df<-structure(list(MktDate = structure(c(17865, 17865, 17865, 17896,
17896, 17896, 17927, 17927, 17927), class = "Date"), Var1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
Var2 = c(3, 5, 2, 4, 3, 2, 1, 2, 5), Var3 = c(8, 7, 6, 9, 8, 9, 5, 8, 7)),
class = "data.frame", row.names = c(NA,-9L))
df$rep<- rep(seq(1,3),3)
df.mut<-reshape(df, idvar = "MktDate", timevar = "rep", direction = "wide")
var1var2=apply(df.mut,1, function(x) cor(as.numeric(x[seq(2,10,3)]), as.numeric(x[seq(3,10,3)])))
var2var3=apply(df.mut,1, function(x) cor(as.numeric(x[seq(3,10,3)]), as.numeric(x[seq(4,10,3)])))
var1var3=apply(df.mut,1, function(x) cor(as.numeric(x[seq(2,10,3)]), as.numeric(x[seq(4,10,3)])))
results <- data.frame(MktDate = rep(unique(df$MktDate)), FactorPair = rep(c("Var1Var2", "Var2Var3", "Var1Var3"), each =3 ),
cor= c(var1var2,var2var3,var1var3))
results <- results[order(results$MktDate),]
Upvotes: 0