Stacey
Stacey

Reputation: 5107

Two dataframes into one

I am not sure if this is possible. I have two dataframes df1 and df2 which are presented like this:

df1             df2             

id value        id value      
 a     5         a   3         
 c     9         b   7         
 d     4         c   6         
 f     2         d   8         
                 e   2         
                 f   1         

They will have many more entries in reality than presented here. I would like to create a third dataframe df3 based on the values in df1 and df2. Any values in df1 would take precedence over values in df2 when writing to df3 (if the same id is present in both df1 and df2) so in this example I would return:

df3
id value
a      5
b      7
c      9
d      4 
e      2
f      2

I have tried using df2 as the base (df2 will have all of the id's present for the whole universe) and then overwriting the value for id's that are present in df1, but cannot find the merge syntax to do this.

Upvotes: 0

Views: 72

Answers (1)

unutbu
unutbu

Reputation: 880359

You could use combine_first, provided that you first make the DataFrame index id (so that the values get aligned by id):

In [80]: df1.set_index('id').combine_first(df2.set_index('id')).reset_index()
Out[80]: 
  id  value
0  a    5.0
1  b    7.0
2  c    9.0
3  d    4.0
4  e    2.0
5  f    2.0

Since you mentioned merging, you might be interested in seeing that you could merge df1 and df2 on id, and then use fillna to replace NaNs in df1's the value column with values from df2's value column:

df1 = pd.DataFrame({'id': ['a', 'c', 'd', 'f'], 'value': [5, 9, 4, 2]})
df2 = pd.DataFrame({'id': ['a', 'b', 'c', 'd', 'e', 'f'], 'value': [3, 7, 6, 8, 2, 1]})
result = pd.merge(df2, df1, on='id', how='left', suffixes=('_x', ''))
result['value'] = result['value'].fillna(result['value_x'])
result = result[['id', 'value']]
print(result)

yields the same result, though the first method is simpler.

Upvotes: 1

Related Questions