Howard Sherman
Howard Sherman

Reputation: 99

Pandas: get unique elements then merge

I think this should be simple, but I'm having difficulty searching for solutions to this problem, perhaps because I don't know the best vocabulary. But to illustrate, say I have three data frames:

df1 = df({'id1':['1','2','3'], 'val1':['a','b','c']})

df2 = df({'id2':['1','2','4'], 'val2':['d','e','f']})

df3 = df({'id3':['1','5','6'], 'val3':['g','h','i']})

What I want to get is:

comb_id    val1    val2    val3
1          a       d       g
2          b       e       n.d.
3          c       n.d.    n.d.
4          n.d.    f       n.d.
5          n.d.    n.d.    h
6          n.d.    n.d.    i

I think it must be an outer merge of some kind but so far I haven't gotten it to work. Anyone know the best way to go about this?

Upvotes: 2

Views: 46

Answers (1)

jezrael
jezrael

Reputation: 862521

Use concat with DataFrame.set_index for all DataFrames:

df = pd.concat([df1.set_index('id1'),
                df2.set_index('id2'),
                df3.set_index('id3')], axis=1, sort=True)
print (df)
  val1 val2 val3
1    a    d    g
2    b    e  NaN
3    c  NaN  NaN
4  NaN    f  NaN
5  NaN  NaN    h
6  NaN  NaN    i

If necessary replace missing values add DataFrame.fillna:

df = pd.concat([df1.set_index('id1'),
                df2.set_index('id2'),
                df3.set_index('id3')], axis=1, sort=True).fillna('n.d.')
print (df)
   val1  val2  val3
1     a     d     g
2     b     e  n.d.
3     c  n.d.  n.d.
4  n.d.     f  n.d.
5  n.d.  n.d.     h
6  n.d.  n.d.     i

Upvotes: 4

Related Questions