Reputation: 2439
I have two dataframes with unique id
s. They share some columns but not all. I need to create a combined dataframe which will include rows from missing id
s from the second dataframe. Tried merge and concat, no luck. It's probably too late, my brain stopped working. Will appreciate your help!
df1 = pd.DataFrame({
'id': ['a','b','c','d','f','g','h','j','k','l','m'],
'metric1': [123,22,356,412,54,634,72,812,129,110,200],
'metric2':[1,2,3,4,5,6,7,8,9,10,11]
})
df2 = pd.DataFrame({
'id': ['a','b','c','d','f','g','h','q','z','w'],
'metric1': [123,22,356,412,54,634,72,812,129,110]
})
df2
The result should look like this:
id metric1 metric2
0 a 123 1.0
1 b 22 2.0
2 c 356 3.0
3 d 412 4.0
4 f 54 5.0
5 g 634 6.0
6 h 72 7.0
7 j 812 8.0
8 k 129 9.0
9 l 110 10.0
10 m 200 11.0
11 q 812 NaN
12 z 129 NaN
13 w 110 NaN
Upvotes: 2
Views: 8771
Reputation: 323226
In this case using combine_first
df1.set_index('id').combine_first(df2.set_index('id')).reset_index()
Out[766]:
id metric1 metric2
0 a 123.0 1.0
1 b 22.0 2.0
2 c 356.0 3.0
3 d 412.0 4.0
4 f 54.0 5.0
5 g 634.0 6.0
6 h 72.0 7.0
7 j 812.0 8.0
8 k 129.0 9.0
9 l 110.0 10.0
10 m 200.0 11.0
11 q 812.0 NaN
12 w 110.0 NaN
13 z 129.0 NaN
Upvotes: 7