Reputation: 567
I want time series correlations in a grouped data frame. Here's a sample dataset:
x <- cbind(expand.grid(type = letters[1:4], time = seq(1:4), kind = letters[5:8]), value = rnorm(64)) %>% arrange(type, time, kind)
which produces 64 rows of the variables type
, time
, kind
and value
.
I want a time series correlation of the values for each kind
grouped by type
. Think of each type
and time
combination as an ordered vector of 4 values. I group by type
and time
, then arrange by kind
, then remove kind
.
y <- x %>% group_by(type) %>% arrange(type, time, kind) %>% select(-kind)
I can then group y
by type and time and nest such that all the values are together in the data variable, regroup by type
only and create a new variable which is the lead data
.
z <- y %>% group_by(type, time) %>% nest(value) %>% group_by(type) %>% mutate(ahead = lead(data))
Now I want to run mutate(R = cor(data, ahead))
, but I can't seem get the syntax correct.
I've also tried mutate(R = cor(data$value, ahead$value))
and mutate(R = cor(data[1]$value, ahead[1]$value))
, to no avail.
The error I get from cor
is: supply both 'x' and 'y' or a matrix-like 'x'
.
How do I reference the data and ahead variables as vectors to run with cor
?
Ultimately, I'm looking for a 16 row data frame with columns type
, time
, and R
where R is a single correlation value.
Thank you for your attention.
Upvotes: 0
Views: 349
Reputation: 388862
We can use map2_dbl
from purrr
to pass data
and ahead
at the same time to cor
function.
library(dplyr)
z %>%
mutate(R = purrr::map2_dbl(data, ahead, cor)) %>%
select(-data, -ahead)
# type time R
# <fct> <int> <dbl>
# 1 a 1 0.358
# 2 a 2 -0.0498
# 3 a 3 -0.654
# 4 a 4 1
# 5 b 1 -0.730
# 6 b 2 0.200
# 7 b 3 -0.928
# 8 b 4 1
# 9 c 1 0.358
#10 c 2 0.485
#11 c 3 -0.417
#12 c 4 1
#13 d 1 0.140
#14 d 2 -0.448
#15 d 3 -0.511
#16 d 4 1
In base R, we can use mapply
z$R <- mapply(cor, z$data, z$ahead)
Upvotes: 1