user11058887
user11058887

Reputation: 39

Correlation matrix to dataframe, by Date

I have a data frame that contains factor columns and a date column. I am looking to figure out the most efficient way to calculate correlation for each factor pair for each date. Here is an example date frame that I'm working with.

structure(list(MktDate = structure(c(17865, 17865, 17865, 17896, 
17896, 17896, 17927, 17927, 17927), class = "Date"), Var1 = c(1, 
2, 3, 1, 2, 3, 1, 2, 3), Var2 = c(3, 5, 2, 4, 3, 2, 1, 2, 5), 
    Var3 = c(8, 7, 6, 9, 8, 9, 5, 8, 7)), class = "data.frame", row.names = c(NA, 
-9L))

I'd like a date frame to be the result in a format similar to what is displayed below

MktDate,FactorPair,Correl
2018-11-30,Var1Var2,-.32733
2018-11-30,Var1Var3,-1
2018-11-30,Var2Var3,.3273
2018-12-31,Var1Var2,-1
...

I'm guessing that this can be easily accomplished using some form of dplyr and apply but I'm not certain how to do it without using a bunch of nested loops.

I appreciate your help in advance.

Upvotes: 0

Views: 1085

Answers (2)

Sada93
Sada93

Reputation: 2835

Here is a more generalized solution that computes the correlation for pairs of n columns.

library(reshape2)
library(dplyr)

#Original Data
df_og = data.frame(MktDate = structure(c(17865, 17865, 17865, 17896, 
                                         17896, 17896, 17927, 17927, 17927), class = "Date"),
                   Var1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
                   Var2 = c(3, 5, 2, 4, 3, 2, 1, 2, 5),
                   Var3 = c(8, 7, 6, 9, 8, 9, 5, 8, 7))

#Dataframe to store result
df_result = data.frame(MktDate = unique(df_og$MktDate))

#Create pars of variables to eventually itterate over
combs = utils::combn(c("Var1","Var2","Var3"),2)%>%
  t()

#Convert to long format data frame and store elements in a vector for each date variable pair
df = df_og %>%
  melt(id.vars = "MktDate")%>%
  group_by(MktDate,variable)%>%
  summarise(val = list(value))%>%
  ungroup()

# Itterate over each combination
for(i in seq(1,nrow(combs))){
  combination = combs[i,] # Select the combination
  new_col_name = paste0(combination,collapse = "") #Define the new column name
  df_result = df %>%
    filter(variable %in% combination)%>% #Select only the variables in this combination
    dcast(MktDate~variable)%>% #Convert back into "fat" formation
    group_by(MktDate)%>% #This resets the row names
    mutate_(.dots = setNames(
      paste0("cor(unlist(",combination[1],"),unlist(",combination[2],"))"),
      new_col_name))%>% # Compute the correlation
    ungroup()%>%
    select_(.dots = c("MktDate",new_col_name))%>%
    inner_join(df_result,by = "MktDate") #Join with the result dataframe
}

# If required convert it back into a long format
df_result = df_result%>%
  melt(id.vars = "MktDate")%>%
  arrange(MktDate)

The advantage of this piece of code is that it's flexible. You can add a new "Var4, Var5, Var6" and you will only need to supply the new column names in combn. combn calculates every pair of variables and the remaining code calculates the correlation between these pairs.

Upvotes: 2

George
George

Reputation: 903

I have created a column to pair the reps together, if they aren't all triplicates then you'll need to adjust that.

library(reshape)
df<-structure(list(MktDate = structure(c(17865, 17865, 17865, 17896, 
              17896, 17896, 17927, 17927, 17927), class = "Date"), Var1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3), 
              Var2 = c(3, 5, 2, 4, 3, 2, 1, 2, 5), Var3 = c(8, 7, 6, 9, 8, 9, 5, 8, 7)),
              class = "data.frame", row.names = c(NA,-9L))
df$rep<- rep(seq(1,3),3)

df.mut<-reshape(df, idvar = "MktDate", timevar = "rep", direction = "wide")

var1var2=apply(df.mut,1, function(x) cor(as.numeric(x[seq(2,10,3)]), as.numeric(x[seq(3,10,3)])))
var2var3=apply(df.mut,1, function(x) cor(as.numeric(x[seq(3,10,3)]), as.numeric(x[seq(4,10,3)])))
var1var3=apply(df.mut,1, function(x) cor(as.numeric(x[seq(2,10,3)]), as.numeric(x[seq(4,10,3)])))

results <- data.frame(MktDate = rep(unique(df$MktDate)), FactorPair = rep(c("Var1Var2", "Var2Var3", "Var1Var3"), each =3 ),
                      cor= c(var1var2,var2var3,var1var3))
results <- results[order(results$MktDate),]

Upvotes: 0

Related Questions