Reputation: 1097
I have 3 pandas dataframes with a structure similar to:
pandas1:
date star col1 col2
2019-01-30T00:32:18.128 tau_Cet 12 25
2019-01-30T00:34:05.525 tau_Cet 23 466
2019-01-03T03:54:59.886 HD_41248 344 997
2019-01-06T03:54:25.886 51_Peg 353 458
pandas2:
date star col3 col4
2019-01-30T00:32:18.128 tau_Cet 1 2
2019-01-30T00:34:05.525 tau_Cet 2 4
2019-01-03T03:54:59.886 HD_41248 34 99
2019-01-06T03:54:25.886 51_Peg 33 45
pandas3:
date star col5 col 4
2019-01-30T00:32:18.128 tau_Cet 4 2
2019-01-30T00:34:05.525 tau_Cet 5 4
2019-01-03T03:54:59.886 HD_41248 3 99
2019-01-06T03:54:25.886 51_Peg 333 45
How can I merge them all on the date
field?
Some properties:
the data to construct each data frames is read from individual tab separated files
the date
field will be unique to each line on each pandas, an d will be the linking key between all parameters;
the data in columns col1-5
might be from different types (string, float, date, etc)
columns col1-5
might or not have the same title. If the title is the same, the table value will also be the same.
the expected final result:
pandasMerged
date star col1 col2 col3 col4 col5
2019-01-30T00:32:18.128 tau_Cet 12 25 1 2 4
2019-01-30T00:34:05.525 tau_Cet 23 466 2 4 5
2019-01-03T03:54:59.886 HD_41248 344 997 34 99 3
2019-01-06T03:54:25.886 51_Peg 353 458 33 45 333
Any ideas/suggestions?
Thanks Jorge
Upvotes: 1
Views: 28
Reputation: 862641
Use concat
in list comprehension with DataFrame.set_index
in first step, then remove duplicated columns by DataFrame.loc
and inverted mask by Index.duplicated
and last convert MultiIndex
to columns by DataFrame.reset_index
:
dfs = [pandas1,pandas2,pandas3]
df = pd.concat([x.set_index(['date','star']) for x in dfs], axis=1)
df = df.loc[:, ~df.columns.duplicated()].reset_index()
print (df)
date star col1 col2 col3 col4 col5
0 2019-01-30T00:32:18.128 tau_Cet 12 25 1 2 4
1 2019-01-30T00:34:05.525 tau_Cet 23 466 2 4 5
2 2019-01-03T03:54:59.886 HD_41248 344 997 34 99 3
3 2019-01-06T03:54:25.886 51_Peg 353 458 33 45 333
Upvotes: 1