Reputation: 717
please consider the following dataframe with daily dates as its index
df1= pd.date_range(start_date, end_date)
df1 = pd.DataFrame(index=date_range, columns=['A', 'B'])
now I have a second dataframe df2 where df2.index is a subset of df1.index I want to join the data from df2 into df1 and for the missing indices I want to have NAN. In a second step I want to replace the NaN with the last available data like this:
2004-03-28 5
2004-03-30 NaN
2004-03-31 NaN
2004-04-01 7
should become
2004-03-28 5
2004-03-30 5
2004-03-31 5
2004-04-01 7
many thanks for your help
Upvotes: 2
Views: 237
Reputation: 394469
Assuming that you have common index and just a single column that is named the same in both dataframes:
First merge
df1 = df1.merge(df2, how='left')
Now fill the missing values using 'ffill' which means forwards fill:
df1 = df1.fillna(method='ffill')
In the situation where the columns are not named the same you can either rename the columns:
right.rename(columnss={'old_name':'new_name'},inplace=True)
or specify the columns from both left and right hand side to merge with:
df1.merge(df2, left_on='left_col', right='right_col', how='left')
if the indexes don't match then you have to set left_index=False
and right_index=False
Upvotes: 1