Merging list of DFs with alternating columns output using Pandas

Question

I have the following codes:

import pandas as pd

rep1 = pd.DataFrame.from_items([('Probe', ['x', 'y', 'z']), ('Gene', ['foo', 'bar', 'qux']), ('RP1',[1.00,23.22,11.12]),('RP1',["A","B","C"])   ], orient='columns')
rep2 = pd.DataFrame.from_items([('Probe', ['x', 'y', 'z']), ('Gene', ['foo', 'bar', 'qux']), ('RP2',[3.33,77.22,18.12]),('RP2',["G","I","K"])   ], orient='columns')
rep3 = pd.DataFrame.from_items([('Probe', ['x', 'y', 'k']), ('Gene', ['foo', 'bar', 'kux']), ('RP3',[99.99,98.29,8.10]),('RP2',["M","P","J"]) ], orient='columns')

tmp = []
tmp.append(rep1)
tmp.append(rep2)
tmp.append(rep3)

Which produces the following list of data frames.

In [56]: tmp
Out[56]:
 [  Probe Gene    RP1 RP1
 0     x  foo   1.00   A
 1     y  bar  23.22   B
 2     z  qux  11.12   C,   Probe Gene    RP2 RP2
 0     x  foo   3.33   G
 1     y  bar  77.22   I
 2     z  qux  18.12   K,   Probe Gene    RP3 RP2
 0     x  foo  99.99   M
 1     y  bar  98.29   P
 2     k  kux   8.10   J]

Each of the data frames above has the following characteristics:

Always contain 4 columns
Second and last column has identical names
First two columns always named Probe and Gene
Content of Probe and Gene always appear consistently, ie. 'x' always goes with 'foo'.

I'm trying to merge those DFs in the list so that it produces this:

  Probe Gene    RP1     RP2  RP3    RP1  RP2  RP3
0     x  foo   1.00    3.33  99.99    A    G   M
1     y  bar  23.22   77.22  98.29    B    I   P
2     z  qux  11.12   18.12   NA      C    K   NA
3     k  kux     NA      NA  8.10     NA   NA  J

I tried this code but failed:

In [67]: reduce(pd.merge,tmp)
MergeError: Left data columns not unique: Index([u'Probe', u'Gene', u'RP1', u'RP1'], dtype='object')

What's the right way to do it?

Merging list of DFs with alternating columns output using Pandas

Answers (1)

Related Questions