Reputation: 380
Considering two timeseries as pandas.Series
:
tser_a
:
date
2016-05-25 13:30:00.023 50.41
2016-05-26 13:30:00.023 51.96
2016-05-27 13:30:00.030 51.98
2016-05-28 13:30:00.041 52.00
2016-05-29 13:30:00.048 52.01
2016-06-02 13:30:00.049 51.97
2016-06-03 13:30:00.072 52.01
2016-06-04 13:30:00.075 52.10
tser_b
:
date
2016-05-24 13:30:00.023 74.41
2016-05-25 13:30:00.023 74.96
2016-05-26 13:30:00.030 74.98
2016-05-27 13:30:00.041 73.00
2016-05-28 13:30:00.048 73.01
2016-05-29 13:30:00.049 73.97
2016-06-02 13:30:00.072 72.01
2016-06-03 13:30:00.075 72.10
I would like to calculate the correlation between these two timeseries.
Pandas does offer the pandas.Series.corr
(ref) function to compute such a value.
corr = tser_a.corr(tser_b)
However, I need to be sure that the correlation takes into account the exact same date for each value, thus considering only the intersection between tser_a
and tser_b
.
As pseudocode:
if ((tser_a[date_x] IS NOT NIL) AND (tser_b[date_x] IS NOT NIL)):
then: consider(tser_a[date_x], tser_b[date_x])
else:
then: skip and go ahead
Then:
tser_a -> 2016-05-24 13:30:00.023 74.41
tser_b -> 2016-06-04 13:30:00.075 52.10
Must be excluded.
Does pandas.Series.corr
assume this behaviour by default or should I first intersect the two timeseries accoring to the date
?
Upvotes: 0
Views: 129
Reputation: 150735
It looks like tser_a.corr(tser_b)
does match the indices. However, since the two data might not have exact same timestamps, you would get unexpected outcome. In stead, you can use resample
first:
tser_a.resample('D').mean().corr(tser_b.resample('D').mean())
# out -0.5522781562573792
Upvotes: 1