SubZeno
SubZeno

Reputation: 380

Compute the correlation between the intersection of two timeseries with pandas.Series

Considering two timeseries as pandas.Series:

tser_a:

date

2016-05-25 13:30:00.023   50.41
2016-05-26 13:30:00.023   51.96
2016-05-27 13:30:00.030   51.98
2016-05-28 13:30:00.041   52.00
2016-05-29 13:30:00.048   52.01
2016-06-02 13:30:00.049   51.97
2016-06-03 13:30:00.072   52.01
2016-06-04 13:30:00.075   52.10

tser_b:

date

2016-05-24 13:30:00.023   74.41
2016-05-25 13:30:00.023   74.96
2016-05-26 13:30:00.030   74.98
2016-05-27 13:30:00.041   73.00
2016-05-28 13:30:00.048   73.01
2016-05-29 13:30:00.049   73.97
2016-06-02 13:30:00.072   72.01
2016-06-03 13:30:00.075   72.10

I would like to calculate the correlation between these two timeseries.

Pandas does offer the pandas.Series.corr (ref) function to compute such a value.

corr = tser_a.corr(tser_b)

My doubt:

However, I need to be sure that the correlation takes into account the exact same date for each value, thus considering only the intersection between tser_a and tser_b.

As pseudocode:

if ((tser_a[date_x] IS NOT NIL) AND (tser_b[date_x] IS NOT NIL)):
    then: consider(tser_a[date_x], tser_b[date_x])
else:
    then: skip and go ahead

Then:

tser_a -> 2016-05-24 13:30:00.023   74.41
tser_b -> 2016-06-04 13:30:00.075   52.10

Must be excluded.

Does pandas.Series.corr assume this behaviour by default or should I first intersect the two timeseries accoring to the date?

Upvotes: 0

Views: 129

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

It looks like tser_a.corr(tser_b) does match the indices. However, since the two data might not have exact same timestamps, you would get unexpected outcome. In stead, you can use resample first:

tser_a.resample('D').mean().corr(tser_b.resample('D').mean())
# out -0.5522781562573792

Upvotes: 1

Related Questions