ben filippi
ben filippi

Reputation: 68

detecting jumps on pandas index dates

I managed to load historical data on data series on a large set of financial instruments, indexed by date.

I am plotting volume , price information without any issue.

What I want to achieve now is to determine if there is any big jump in dates, to see if I am missing large chunks of data.

The idea I had in mind was somehow to plot the difference in between two consecutive dates in the index and if the number is superior to 3 or 4 ( which is bigger than a week end and a bank holiday on a friday or monday ) then there is an issue.

Problem is I can figure out how do compute simply df[next day]-df[day], where df is indexed by day

Upvotes: 1

Views: 2446

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375685

You can use the shift Series method (note the DatetimeIndex method shifts by freq):

In [11]: rng = pd.DatetimeIndex(['20120101', '20120102', '20120106']) # DatetimeIndex like df.index

In [12]: s = pd.Series(rng)  # df.index instead of rng

In [13]: s - s.shift()
Out[13]:
0                NaT
1   1 days, 00:00:00
2   4 days, 00:00:00
dtype: timedelta64[ns]

In [14]: s - s.shift() > pd.offsets.Day(3).nanos
Out[14]:
0    False
1    False
2     True
dtype: bool

Depending on what you want, perhaps you could either do any, or find the problematic values...

In [15]: (s - s.shift() > pd.offsets.Day(3).nanos).any()
Out[15]: True

In [16]: s[s - s.shift() > pd.offsets.Day(3).nanos]
Out[16]:
2   2012-01-06 00:00:00
dtype: datetime64[ns]

Or perhaps find the maximum jump (and where it is):

In [17]: (s - s.shift()).max()  # it's weird this returns a Series...
Out[17]:
0   4 days, 00:00:00
dtype: timedelta64[ns]

In [18]: (s - s.shift()).idxmax()
Out[18]: 2

If you really wanted to plot this, simply plotting the difference would work:

(s - s.shift()).plot()

Upvotes: 2

Related Questions