jorgehumberto
jorgehumberto

Reputation: 1097

merge pandas dataframes on value from column

I have 3 pandas dataframes with a structure similar to:

pandas1:
date    star    col1    col2    
2019-01-30T00:32:18.128 tau_Cet 12  25
2019-01-30T00:34:05.525 tau_Cet 23  466
2019-01-03T03:54:59.886 HD_41248    344 997
2019-01-06T03:54:25.886 51_Peg  353 458


pandas2:
date    star    col3    col4    
2019-01-30T00:32:18.128 tau_Cet 1   2
2019-01-30T00:34:05.525 tau_Cet 2   4
2019-01-03T03:54:59.886 HD_41248    34  99
2019-01-06T03:54:25.886 51_Peg  33  45


pandas3:
date    star    col5    col 4   
2019-01-30T00:32:18.128 tau_Cet 4   2
2019-01-30T00:34:05.525 tau_Cet 5   4
2019-01-03T03:54:59.886 HD_41248    3   99
2019-01-06T03:54:25.886 51_Peg  333 45

How can I merge them all on the date field?

Some properties:

the expected final result:

pandasMerged
date    star    col1    col2    col3    col4    col5
2019-01-30T00:32:18.128 tau_Cet     12  25  1   2   4
2019-01-30T00:34:05.525 tau_Cet     23  466 2   4   5
2019-01-03T03:54:59.886 HD_41248    344 997 34  99  3
2019-01-06T03:54:25.886 51_Peg      353 458 33  45  333

Any ideas/suggestions?

Thanks Jorge

Upvotes: 1

Views: 28

Answers (1)

jezrael
jezrael

Reputation: 862641

Use concat in list comprehension with DataFrame.set_index in first step, then remove duplicated columns by DataFrame.loc and inverted mask by Index.duplicated and last convert MultiIndex to columns by DataFrame.reset_index:

dfs = [pandas1,pandas2,pandas3]

df = pd.concat([x.set_index(['date','star']) for x in dfs], axis=1)
df = df.loc[:, ~df.columns.duplicated()].reset_index()
print (df)
                      date      star  col1  col2  col3  col4  col5
0  2019-01-30T00:32:18.128   tau_Cet    12    25     1     2     4
1  2019-01-30T00:34:05.525   tau_Cet    23   466     2     4     5
2  2019-01-03T03:54:59.886  HD_41248   344   997    34    99     3
3  2019-01-06T03:54:25.886    51_Peg   353   458    33    45   333

Upvotes: 1

Related Questions