Reputation: 5068
I have a time series of multiple factors:
df = read.table(text="
date factor stock value
30-Jun-17 DivYield AAPL 0.05
30-Jun-17 DivYield GOOG 0.055
30-Jun-17 DivYield MSFT 0.02
31-Jul-17 DivYield AAPL 0.055
31-Jul-17 DivYield GOOG 0.05
31-Jul-17 DivYield MSFT 0.025
30-Jun-17 PB AAPL 12
30-Jun-17 PB GOOG 11
30-Jun-17 PB MSFT 16
31-Jul-17 PB AAPL 11
31-Jul-17 PB GOOG 12
31-Jul-17 PB MSFT 14
30-Jun-17 ROE AAPL 0.1
30-Jun-17 ROE GOOG 0.12
30-Jun-17 ROE MSFT 0.12
31-Jul-17 ROE AAPL 0.1
31-Jul-17 ROE GOOG 0.1
31-Jul-17 ROE MSFT 0.12
", header = TRUE)
df$date = lubridate::dmy(df$date)
I need to compute the pairwise correlations between factors, and I need to do it every day. The result for Pearson correlations would look something like:
Date Factor1 Factor2 Correlation.Time.Series
30-Jun-17 DivYield PB -0.998337488
30-Jun-17 DivYield ROE -0.381246426
30-Jun-17 PB ROE 0.327326835
31-Jul-17 DivYield PB -0.984324138
31-Jul-17 DivYield ROE -0.987829161
31-Jul-17 PB ROE 0.944911183
Any ideas on how to attack this one?
Here's my first attempt:
library(tidyverse)
df.spread = spread(df, key = factor, value = value)
first.attempt = df.spread %>%
select(-stock) %>%
group_by(date) %>%
do(as.data.frame(cor(.[,-1])))
That seems to do it. The problem is the output has no label showing me what the correlation is with:
date DivYield PB ROE
1 2017-06-30 1.0000000 -0.9983375 -0.3812464
2 2017-06-30 -0.9983375 1.0000000 0.3273268
3 2017-06-30 -0.3812464 0.3273268 1.0000000
4 2017-07-31 1.0000000 -0.9843241 -0.9878292
5 2017-07-31 -0.9843241 1.0000000 0.9449112
6 2017-07-31 -0.9878292 0.9449112 1.0000000
Upvotes: 0
Views: 344
Reputation: 2960
Check out the corrr
package. This along with a mutate + map
combo will get you a column of rownames so you can match the correlation pairs.
df.spread %>%
select(-stock) %>%
group_by(date) %>%
nest() %>%
mutate(cor_tbls = map(data, ~corrr::correlate(.x))) %>%
unnest(cor_tbls)
This gives you:
# A tibble: 6 x 5
date rowname DivYield PB ROE
<date> <chr> <dbl> <dbl> <dbl>
1 2017-06-30 DivYield NA -0.9983375 -0.3812464
2 2017-06-30 PB -0.9983375 NA 0.3273268
3 2017-06-30 ROE -0.3812464 0.3273268 NA
4 2017-07-31 DivYield NA -0.9843241 -0.9878292
5 2017-07-31 PB -0.9843241 NA 0.9449112
6 2017-07-31 ROE -0.9878292 0.9449112 NA
Upvotes: 2