rahul
rahul

Reputation: 147

Automating interpolation of missing values in pandas dataframe

I have a dataframe with airline booking data for the past year for a particular origin and destination. There are hundreds of similar data-sets in the system.

In each data-set, there are holes in data. In the current example, we have about 85 days of year for which we don't have booking data.

There are two columns here - departure_date and bookings.

The next step for me would be to include the missing dates in the date column, and set the corresponding values in bookings column to NaN.

I am looking for the best way to do this.

Please find a part of the dataFrame below:

Index       departure_date              bookings
0           2017-11-02 00:00:00             43
1           2017-11-03 00:00:00             27
2           2017-11-05 00:00:00             27 ********
3           2017-11-06 00:00:00             22
4           2017-11-07 00:00:00             39
.
.
164         2018-05-22 00:00:00             17
165         2018-05-23 00:00:00             41
166         2018-05-24 00:00:00             73
167         2018-07-02 00:00:00             4  *********
168         2018-07-03 00:00:00             31
.
.
277         2018-10-31 00:00:00             50
278         2018-11-01 00:00:00             60

We can see that the data-set is for a one year period (Nov 2, 2017 to Nov 1, 2018). But we have data for 279 days only. For example, we don't have any data between 2018-05-25 and 2018-07-01. I would have to include these dates in the departure_date column and set the corresponding booking values to NaN.

For the second step, I plan to do some interpolation using something like

dataFrame['bookings'].interpolate(method='time', inplace=True)

Please suggest if there are better alternatives in Python.

Upvotes: 0

Views: 247

Answers (1)

Ludo Schmidt
Ludo Schmidt

Reputation: 1403

This resample for each day. Then fill the gaps.

dataFrame['bookings'].resample('D').pad()

You can have more resampler idea on this page (so you can select the one that fit the best with your needs): https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html

Upvotes: 1

Related Questions