Reputation: 68
I managed to load historical data on data series on a large set of financial instruments, indexed by date.
I am plotting volume , price information without any issue.
What I want to achieve now is to determine if there is any big jump in dates, to see if I am missing large chunks of data.
The idea I had in mind was somehow to plot the difference in between two consecutive dates in the index and if the number is superior to 3 or 4 ( which is bigger than a week end and a bank holiday on a friday or monday ) then there is an issue.
Problem is I can figure out how do compute simply df[next day]-df[day], where df is indexed by day
Upvotes: 1
Views: 2446
Reputation: 375685
You can use the shift
Series method (note the DatetimeIndex method shifts by freq):
In [11]: rng = pd.DatetimeIndex(['20120101', '20120102', '20120106']) # DatetimeIndex like df.index
In [12]: s = pd.Series(rng) # df.index instead of rng
In [13]: s - s.shift()
Out[13]:
0 NaT
1 1 days, 00:00:00
2 4 days, 00:00:00
dtype: timedelta64[ns]
In [14]: s - s.shift() > pd.offsets.Day(3).nanos
Out[14]:
0 False
1 False
2 True
dtype: bool
Depending on what you want, perhaps you could either do any, or find the problematic values...
In [15]: (s - s.shift() > pd.offsets.Day(3).nanos).any()
Out[15]: True
In [16]: s[s - s.shift() > pd.offsets.Day(3).nanos]
Out[16]:
2 2012-01-06 00:00:00
dtype: datetime64[ns]
Or perhaps find the maximum jump (and where it is):
In [17]: (s - s.shift()).max() # it's weird this returns a Series...
Out[17]:
0 4 days, 00:00:00
dtype: timedelta64[ns]
In [18]: (s - s.shift()).idxmax()
Out[18]: 2
If you really wanted to plot this, simply plotting the difference would work:
(s - s.shift()).plot()
Upvotes: 2