Sri2110
Sri2110

Reputation: 335

Check if my time series index data has any missing values for weekdays

I have time series data from "5 Jan 2015" to "28 Dec 2018". I observed some of the working days' dates and their values are missing. How to check how many weekdays are missing in my time range? and what are those dates so that i can extrapolate the values for those dates.

Example:

Date    Price    Volume
2018-12-28  172.0   800
2018-12-27  173.6   400
2018-12-26  170.4   500
2018-12-25  171.0   2200
2018-12-21  172.8   800

On observing calendar, 21st Dec, 2018 was Friday. Then excluding Saturday and Sunday, the dataset should be having "24th Dec 2018" in the list, but its missing. I need to identify such missing dates from range.

My approach till now: I tried using

pd.date_range('2015-01-05','2018-12-28',freq='W')

to identify the number of weeks and calculate the no. of weekdays from them manually, to identify number of missing dates. But it dint solved purpose as I need to identify missing dates from range.

Upvotes: 1

Views: 1379

Answers (1)

Jarad
Jarad

Reputation: 18943

Let's say this is your full dataset:

Date    Price    Volume
2018-12-28  172.0   800
2018-12-27  173.6   400
2018-12-26  170.4   500
2018-12-25  171.0   2200
2018-12-21  172.8   800

And dates were:

dates = pd.date_range('2018-12-15', '2018-12-31')

First, make sure the Date column is actually a date type:

df['Date'] = pd.to_datetime(df['Date'])

Then set Date as the index:

df = df.set_index('Date')

Then reindex with unutbu's solution:

df = df.reindex(dates, fill_value=0.0)

Then reset the index to make it easier to work with:

df = df.reset_index()

It now looks like this:

        index  Price  Volume
0  2018-12-15    0.0     0.0
1  2018-12-16    0.0     0.0
2  2018-12-17    0.0     0.0
3  2018-12-18    0.0     0.0
4  2018-12-19    0.0     0.0
5  2018-12-20    0.0     0.0
6  2018-12-21  172.8   800.0
7  2018-12-22    0.0     0.0
8  2018-12-23    0.0     0.0
9  2018-12-24    0.0     0.0
10 2018-12-25  171.0  2200.0
11 2018-12-26  170.4   500.0
12 2018-12-27  173.6   400.0
13 2018-12-28  172.0   800.0
14 2018-12-29    0.0     0.0
15 2018-12-30    0.0     0.0
16 2018-12-31    0.0     0.0

Do:

df['weekday'] = df['index'].dt.dayofweek

Finally, how many weekdays are missing in your time range:

missing_weekdays = df[(~df['weekday'].isin([5,6])) & (df['Volume'] == 0.0)]

Result:

>>> missing_weekdays
        index  Price  Volume  weekday
2  2018-12-17    0.0     0.0        0
3  2018-12-18    0.0     0.0        1
4  2018-12-19    0.0     0.0        2
5  2018-12-20    0.0     0.0        3
9  2018-12-24    0.0     0.0        0
16 2018-12-31    0.0     0.0        0

Upvotes: 2

Related Questions