Reputation: 29
I have a dataset and its indexes consist of timestamps. It's pandas series just like below:
Time
2013-09-17 22:08:11 0
2013-09-17 22:08:18 0
2013-09-17 22:08:26 0
2013-09-17 22:08:34 0
2013-09-17 22:08:42 0
2013-09-17 22:08:50 0
2013-09-17 22:08:58 0
2013-09-17 22:09:06 0
2013-09-17 22:09:11 0
2013-09-17 22:09:13 0
2013-09-17 22:09:19 0
2013-09-17 22:09:21 0
2013-09-17 22:09:27 0
2013-09-17 22:09:35 0
2013-09-17 22:09:43 0
Name: dummy_frame, dtype: float64
Data are recorded irregularly regarding to timestamps. Now what I want to do is to check this data, if there is date skip or jump inside it, such as from 2013-09-07 to 2013-12-22. I can do it simply with check first and last date and compare them relatively. However, I need to find where this jump occurs. Is there any easy way to find it out?
Thank you.
Upvotes: 0
Views: 793
Reputation: 4618
IIUC:
x = #your series
x.index = pd.to_datetime(x.index)
jumps = x.index.dt.date - x.index.shift(1).dt.date
This will create a series where jump[i] is the difference between jump[i] and jump[i-1] if you want to find where jump>1, just do:
x[jump>1]
Upvotes: 1
Reputation: 1473
I believe you could simply create a data range with the same date format and compare both lists:
from datetime import datetime,timedelta
start_date = datetime.strptime("2013-09-07","%Y-%m-%d")
end_date = datetime.strptime("2013-12-22","%Y-%m-%d")
# This will create a list with complete dates
completeDates = [start_date + timedelta(days=x) for x in range(0,(end_dat-start_date ).days + 1)]
completeDates = [d.strftime("%Y-%m-%d") for d in completeDates] # Convert date to string
# Get your list from data frame index, and remove hours
myDates = dummy_frame.index.tolist()
# Is possible that your dates are in datetime obj or in string
# If string
myDates = [d.split()[0] for d in myDates]
# If date
myDates = [d.strftime("%Y-%m-%d") for d in myDates]
# Creates a list with missing data
missingDates = [d for d in completeDates if d not in myDates]
In this sense missingDates
will be a list contaning all the missing dates or jumps from your data frame. Please let me know if this helps!
Upvotes: 0