Reputation: 99
I think this should be simple, but I'm having difficulty searching for solutions to this problem, perhaps because I don't know the best vocabulary. But to illustrate, say I have three data frames:
df1 = df({'id1':['1','2','3'], 'val1':['a','b','c']})
df2 = df({'id2':['1','2','4'], 'val2':['d','e','f']})
df3 = df({'id3':['1','5','6'], 'val3':['g','h','i']})
What I want to get is:
comb_id val1 val2 val3
1 a d g
2 b e n.d.
3 c n.d. n.d.
4 n.d. f n.d.
5 n.d. n.d. h
6 n.d. n.d. i
I think it must be an outer merge of some kind but so far I haven't gotten it to work. Anyone know the best way to go about this?
Upvotes: 2
Views: 46
Reputation: 862521
Use concat
with DataFrame.set_index
for all DataFrame
s:
df = pd.concat([df1.set_index('id1'),
df2.set_index('id2'),
df3.set_index('id3')], axis=1, sort=True)
print (df)
val1 val2 val3
1 a d g
2 b e NaN
3 c NaN NaN
4 NaN f NaN
5 NaN NaN h
6 NaN NaN i
If necessary replace missing values add DataFrame.fillna
:
df = pd.concat([df1.set_index('id1'),
df2.set_index('id2'),
df3.set_index('id3')], axis=1, sort=True).fillna('n.d.')
print (df)
val1 val2 val3
1 a d g
2 b e n.d.
3 c n.d. n.d.
4 n.d. f n.d.
5 n.d. n.d. h
6 n.d. n.d. i
Upvotes: 4