Steve D.
Steve D.

Reputation: 316

Forward filling missing dates into Python Pandas Dataframe

I have a Panda's dataframe that is filled as follows:

ref_date    tag
1/29/2010   1
2/26/2010   3
3/31/2010   4
4/30/2010   4
5/31/2010   1
6/30/2010   3
8/31/2010   1
9/30/2010   4
12/31/2010  2

Note how there are missing months (i.e. 7, 10, 11) in the data. I want to fill in the missing data through a forward filling method so that it looks like this:

ref_date    tag
1/29/2010   1
2/26/2010   3
3/31/2010   4
4/30/2010   4
5/31/2010   1
6/30/2010   3
7/30/2010   3
8/31/2010   1
9/30/2010   4
10/29/2010  4
11/30/2010  4
12/31/2010  2

The tag of the missing date will have the tag of the previous. All dates represent the last business day of the month.

This is what I tried to do:

idx = pd.date_range(start='1/29/2010', end='12/31/2010', freq='BM')
df.ref_date.index = pd.to_datetime(df.ref_date.index)
df = df.reindex(index=[idx], columns=[ref_date], method='ffill')

It's giving me the error:

TypeError: Cannot compare type 'Timestamp' with type 'int'

where pd is pandas and df is the dataframe.

I'm new to Pandas Dataframe, so any help would be appreciated!

Upvotes: 3

Views: 4152

Answers (2)

Alexander
Alexander

Reputation: 109736

You were very close, you just need to set the dataframe's index with the ref_date, reindex it to the business day month end index while specifying ffill at the method, then reset the index and rename back to the original:

# First ensure the dates are Pandas Timestamps.
df['ref_date'] = pd.to_datetime(df['ref_date'])

# Create a monthly index.
idx_monthly = pd.date_range(start='1/29/2010', end='12/31/2010', freq='BM')

# Reindex to the daily index, forward fill, reindex to the monthly index.
>>> (df
     .set_index('ref_date')
     .reindex(idx_monthly, method='ffill')
     .reset_index()
     .rename(columns={'index': 'ref_date'}))
     ref_date  tag
0  2010-01-29  1.0
1  2010-02-26  3.0
2  2010-03-31  4.0
3  2010-04-30  4.0
4  2010-05-31  1.0
5  2010-06-30  3.0
6  2010-07-30  3.0
7  2010-08-31  1.0
8  2010-09-30  4.0
9  2010-10-29  4.0
10 2010-11-30  4.0
11 2010-12-31  2.0

Upvotes: 1

Steve D.
Steve D.

Reputation: 316

Thanks to the previous person that answered this question but deleted his answer. I got the solution:

df[ref_date] = pd.to_datetime(df[ref_date])
idx = pd.date_range(start='1/29/2010', end='12/31/2010', freq='BM')
df = df.set_index(ref_date).reindex(idx).ffill().reset_index().rename(columns={'index': ref_date})

Upvotes: 0

Related Questions