Reputation: 13
I have two dataframes df1 and df2 each with the same column names using timestamps as indicies. I want to concatenate the two dataframes whilst merging rows with the same index choosing the values stored in df2 as preference. This is poorly worded but see below. E.g
>>> df1= TimeStamp A_Output B_Output C_Output
00:00:00 20 15 5
00:00:06 20 NaN 3
00:00:15 15 6 NaN
00:00:20 20 NaN 5
00:00:30 25 14 10
>>> df2= TimeStamp A_Output B_Output C_Output
00:00:00 15 5 8
00:00:04 16 NaN NaN
00:00:06 17 NaN NaN
00:00:15 NaN NaN 2
00:00:18 19 NaN NaN
00:00:21 14 NaN NaN
00:00:26 32 NaN 5
>>> df3= TimeStamp A_Output B_Output C_Output
00:00:00 15 5 8
00:00:04 16 NaN NaN
00:00:06 17 NaN 3
00:00:15 15 6 2
00:00:18 19 NaN NaN
00:00:21 14 NaN NaN
00:00:26 32 NaN 5
00:00:30 25 14 10
df3 is what I would like to achieve. Here there is a timestamp for every index in df1 and df2. For each common index, where db2 is not NaN, we take the values, otherwise we preserve those stored in df1.
df1 >>> 00:00:15 15 6 NaN
df2 >>> 00:00:15 NaN NaN 2
df3 >>> 00:00:15 15 6 2
df1 >>> 00:00:00 20 15 5
df2 >>> 00:00:00 15 5 8
df3 >>> 00:00:00 15 5 8
For clarification see the above examples. I really can't find a way to do this -- for reference each dataframe has around 90 columns and 100k+ rows.
Upvotes: 1
Views: 122
Reputation: 23099
Try combine first :
df3 = df2.combine_first(df1)
print(df3)
A_Output B_Output C_Output
TimeStamp
00:00:00 15.0 5.0 8.0
00:00:04 16.0 NaN NaN
00:00:06 17.0 NaN 3.0
00:00:15 15.0 6.0 2.0
00:00:18 19.0 NaN NaN
00:00:20 20.0 NaN 5.0
00:00:21 14.0 NaN NaN
00:00:26 32.0 NaN 5.0
00:00:30 25.0 14.0 10.0
Upvotes: 2