Reputation: 4490
I have a list of Pandas DFs with every DF having the same columns
:
df1_values = [["2001-01-01","Lime",10],["2001-01-02","Lime",20]]
df2_values = [["2001-01-01","Mango",40],["2001-01-02","Mango",50],["2001-01-03","Mango",60]]
df3_values = [["2001-01-01","Orange",30]]
df1 = pd.DataFrame(df1_values,columns=["date","fruit","value"])
df2 = pd.DataFrame(df2_values,columns=["date","fruit","value"])
df3 = pd.DataFrame(df3_values,columns=["date","fruit","value"])
dfs = [df1,df2,df3]
one of the sample DFs --> DF1:
date fruit value
0 2001-01-01 Lime 10
1 2001-01-02 Lime 20
Trying to merge
all the DFs in the list
in the format below (grouped on date), EXPECTED OP:
date fruit value
2001-01-01 Lime 10
2001-01-01 Mango 40
2001-01-01 Orange 30
2001-01-02 Lime 20
2001-01-02 Mango 50
2001-01-03 Mango 60
Current iterative approach:
date_dict={}
for each_date in ["2001-01-01","2001-01-02","2001-01-03"]:
for each_df in dfs:
if each_date in date_dict:
#append the values for this date
else:
#enter the values for this date
It is working but it is taking a long time.
Pandas Approach:
from functools import reduce
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['fruit'],
how='outer'), dfs)
OP:
date_x fruit value_x date_y value_y date value
0 2001-01-01 Lime 10.0 NaN NaN NaN NaN
1 2001-01-02 Lime 20.0 NaN NaN NaN NaN
2 NaN Mango NaN 2001-01-01 40.0 NaN NaN
3 NaN Mango NaN 2001-01-02 50.0 NaN NaN
4 NaN Mango NaN 2001-01-03 60.0 NaN NaN
5 NaN Orange NaN NaN NaN 2001-01-01 30.0
Any suggestions on how to correct the mistakes could be helpful.
Upvotes: 2
Views: 166
Reputation: 195528
You can do pandas.concat
followed by .sort_values
:
print( pd.concat(dfs).sort_values('date') )
Prints:
date fruit value
0 2001-01-01 Lime 10
0 2001-01-01 Mango 40
0 2001-01-01 Orange 30
1 2001-01-02 Lime 20
1 2001-01-02 Mango 50
2 2001-01-03 Mango 60
Upvotes: 3