Reputation: 335
I have time series data from "5 Jan 2015" to "28 Dec 2018". I observed some of the working days' dates and their values are missing. How to check how many weekdays are missing in my time range? and what are those dates so that i can extrapolate the values for those dates.
Example:
Date Price Volume
2018-12-28 172.0 800
2018-12-27 173.6 400
2018-12-26 170.4 500
2018-12-25 171.0 2200
2018-12-21 172.8 800
On observing calendar, 21st Dec, 2018 was Friday. Then excluding Saturday and Sunday, the dataset should be having "24th Dec 2018" in the list, but its missing. I need to identify such missing dates from range.
My approach till now: I tried using
pd.date_range('2015-01-05','2018-12-28',freq='W')
to identify the number of weeks and calculate the no. of weekdays from them manually, to identify number of missing dates. But it dint solved purpose as I need to identify missing dates from range.
Upvotes: 1
Views: 1379
Reputation: 18943
Let's say this is your full dataset:
Date Price Volume
2018-12-28 172.0 800
2018-12-27 173.6 400
2018-12-26 170.4 500
2018-12-25 171.0 2200
2018-12-21 172.8 800
And dates
were:
dates = pd.date_range('2018-12-15', '2018-12-31')
First, make sure the Date column is actually a date type:
df['Date'] = pd.to_datetime(df['Date'])
Then set Date as the index:
df = df.set_index('Date')
Then reindex with unutbu's solution:
df = df.reindex(dates, fill_value=0.0)
Then reset the index to make it easier to work with:
df = df.reset_index()
It now looks like this:
index Price Volume
0 2018-12-15 0.0 0.0
1 2018-12-16 0.0 0.0
2 2018-12-17 0.0 0.0
3 2018-12-18 0.0 0.0
4 2018-12-19 0.0 0.0
5 2018-12-20 0.0 0.0
6 2018-12-21 172.8 800.0
7 2018-12-22 0.0 0.0
8 2018-12-23 0.0 0.0
9 2018-12-24 0.0 0.0
10 2018-12-25 171.0 2200.0
11 2018-12-26 170.4 500.0
12 2018-12-27 173.6 400.0
13 2018-12-28 172.0 800.0
14 2018-12-29 0.0 0.0
15 2018-12-30 0.0 0.0
16 2018-12-31 0.0 0.0
Do:
df['weekday'] = df['index'].dt.dayofweek
Finally, how many weekdays are missing in your time range:
missing_weekdays = df[(~df['weekday'].isin([5,6])) & (df['Volume'] == 0.0)]
Result:
>>> missing_weekdays
index Price Volume weekday
2 2018-12-17 0.0 0.0 0
3 2018-12-18 0.0 0.0 1
4 2018-12-19 0.0 0.0 2
5 2018-12-20 0.0 0.0 3
9 2018-12-24 0.0 0.0 0
16 2018-12-31 0.0 0.0 0
Upvotes: 2