azuric
azuric

Reputation: 2839

Join/Merge two pandas dataframes and filling

I have two pandas dataframes both holding irregular timeseries data.

I want merge/join the two frames by time.

I also want to forward fill the other columns of frame2 for any "new" rows that were added through the joining process. How can I do this?

I have tried:

df = pd.merge(df1, df2, on="DateTime")

but this just leave a frame with matching timestamp rows.

I would be grateful for any ideas!

Upvotes: 9

Views: 18125

Answers (2)

Minkoo Seo
Minkoo Seo

Reputation: 101

Left join then fillna doesn't fully utilize the data in right. We usually expect ffill from the most recent data of the right.

See that we get full of NaNs in the below although df2 has data on 2, 4, and 6 which can be good approximate of its value on 3, 5, and 7.

In [47]: df1 = pd.DataFrame({'a': [1, 3, 5, 7]}, index=[1, 3, 5, 7])
    ...: df2 = pd.DataFrame({'b': [2, 4, 6, 8]}, index=[2, 4, 6, 8])

In [48]: pd.merge(df1, df2, how='left', left_index=True, right_index=True).ffill()
Out[48]: 
   a   b
1  1 NaN
3  3 NaN
5  5 NaN
7  7 NaN

Instead, ffill first and then join. See that we're getting the data from the most recent date of the df2.

In [50]: new_df2 = df2.reindex(df1.index, method='ffill')

In [51]: new_df2
Out[51]: 
     b
1  NaN
3  2.0
5  4.0
7  6.0

And then join:

In [52]: pd.merge(df1, new_df2, how='left', left_index=True, right_index=True)
Out[52]: 
   a    b
1  1  NaN
3  3  2.0
5  5  4.0
7  7  6.0

Upvotes: 0

chrisb
chrisb

Reputation: 52276

Try this. The how='left' will have the merge keep all records of df1, and the fillna will populate missing values.

df = pd.merge(df1, df2, on='DateTime', how='left').fillna(method='ffill')

Upvotes: 18

Related Questions