Reputation: 2839
I have two pandas dataframes both holding irregular timeseries data.
I want merge/join the two frames by time.
I also want to forward fill the other columns of frame2 for any "new" rows that were added through the joining process. How can I do this?
I have tried:
df = pd.merge(df1, df2, on="DateTime")
but this just leave a frame with matching timestamp rows.
I would be grateful for any ideas!
Upvotes: 9
Views: 18125
Reputation: 101
Left join then fillna doesn't fully utilize the data in right
. We usually expect ffill from the most recent data of the right
.
See that we get full of NaNs in the below although df2
has data on 2, 4, and 6 which can be good approximate of its value on 3, 5, and 7.
In [47]: df1 = pd.DataFrame({'a': [1, 3, 5, 7]}, index=[1, 3, 5, 7])
...: df2 = pd.DataFrame({'b': [2, 4, 6, 8]}, index=[2, 4, 6, 8])
In [48]: pd.merge(df1, df2, how='left', left_index=True, right_index=True).ffill()
Out[48]:
a b
1 1 NaN
3 3 NaN
5 5 NaN
7 7 NaN
Instead, ffill first and then join. See that we're getting the data from the most recent date of the df2.
In [50]: new_df2 = df2.reindex(df1.index, method='ffill')
In [51]: new_df2
Out[51]:
b
1 NaN
3 2.0
5 4.0
7 6.0
And then join:
In [52]: pd.merge(df1, new_df2, how='left', left_index=True, right_index=True)
Out[52]:
a b
1 1 NaN
3 3 2.0
5 5 4.0
7 7 6.0
Upvotes: 0
Reputation: 52276
Try this. The how='left'
will have the merge keep all records of df1, and the fillna
will populate missing values.
df = pd.merge(df1, df2, on='DateTime', how='left').fillna(method='ffill')
Upvotes: 18