Reputation: 475
I have several similarly structured pandas dataframes stored in a dictionary. I access a dataframe in the following way.
ex_dict[df1]
date df1price1 df1price2
10-20-2015 100 150
10-21-2015 90 100
I want to merge all of these dataframes into one dataframe by date. The dates are overlapping, but not all dataframes include all dates.
I need to go from this
df1
date df1price1 df1price2
10-20-2015 100 150
10-21-2015 90 100
10-22-2015 100 140
df2
date df2price1 df2price2
10-20-2015 110 140
10-21-2015 90 110
10-23-2015 110 120
df3
date df3price1 df3price2
10-20-2015 100 150
10-22-2015 90 100
10-23-2015 80 130
to this:
df_all
date df1price1 df1price2 ... df3price1 df3price2
10-20-2015 100 150 ... 100 150
10-21-2015 90 100 ... NaN NaN
10-22-2015 100 140 ... 90 100
10-23-2015 NaN NaN ... 80 130
I've tried lots of things, but I can't get it to work, short of repeatedly merging 2 at a time to create a new dataframe and then remerging onto that. The number of dataframes I need to merge varies between 4 and 10, so I need a way to do this automatically (hence why I thought a passing a dict might work).
Any help on this would be incredibly appreciated.
Upvotes: 1
Views: 1247
Reputation: 31692
You could use multiple merge on date
column:
df1.merge(df2, on='date', how='outer').merge(df3, on='date', how='outer').set_index('date')
In [107]: df1.merge(df2, on='date', how='outer').merge(df3, on='date', how='outer').set_index('date')
Out[107]:
df1price1 df1price2 df2price1 df2price2 df3price1 df3price2
date
10-20-2015 100 150 110 140 100 150
10-21-2015 90 100 90 110 NaN NaN
10-22-2015 100 140 NaN NaN 90 100
10-23-2015 NaN NaN 110 120 80 130
Some explanation: First you merging df1
and df2
on column date
with joining outer
. The the resulted dataframe you merging with df3
with the same attributes. And finnaly setting index date
for your resulted dateframe. If your dataframes have date
columns as index you could first do reset_index
for each of them and merge on the column name containing date
Upvotes: 0
Reputation: 36555
You can use a concat
followed by a groupby('date')
to flatten the result.
In [22]: pd.concat([df1,df2,df3]).groupby('date').max()
Out[22]:
df1price1 df1price2 df2price1 df2price2 df3price1 df3price2
date
10-20-2015 100 150 110 140 100 150
10-21-2015 90 100 90 110 NaN NaN
10-22-2015 100 140 NaN NaN 90 100
10-23-2015 NaN NaN 110 120 80 130
Edit: As BrenBarn points out in the comments, you can use concat(axis=1)
if you set the join column as the index of your dataframes:
df1.index = df1.date
df2.index = df2.date
df3.index = df3.date
In [44]: pd.concat([df1,df2,df3],axis=1)
Out[44]:
date df1price1 df1price2 date df2price1 \
10-20-2015 10-20-2015 100 150 10-20-2015 110
10-21-2015 10-21-2015 90 100 10-21-2015 90
10-22-2015 10-22-2015 100 140 NaN NaN
10-23-2015 NaN NaN NaN 10-23-2015 110
df2price2 date df3price1 df3price2
10-20-2015 140 10-20-2015 100 150
10-21-2015 110 NaN NaN NaN
10-22-2015 NaN 10-22-2015 90 100
10-23-2015 120 10-23-2015 80 130
Upvotes: 1