Reputation: 988
I have already checked few earlier questions and I have some what unique problem. I have three excel file and I load them into three different dataframe. Basically I have to add contents of excel_1 and excel_2 and compare the contents against excel_3
Example data: (excel_1 sales Territory#1)
Name Year Item sales_Amount1
A1 1.2019 Badam 2
A1 1.2019 Badam 10
A1 1.2019 carrot 8
A1 1.2019 carrot 10
A2 1.2019 Badam 10
A2 1.2019 Badam 20
A3 2.2019 soap 3
A1 2.2019 soap 1
Example data: (excel_2 sales Territory#2)
Name Year Item sales_Amount2
A1 1.2019 Badam 60
A1 1.2019 Badam 10
A2 1.2019 Badam 40
A2 1.2019 Badam 1
A3 2.2019 soap 1
A3 2.2019 soap 10
A1 2.2019 soap 10
excel_3 target also has similar data
Name Year Item target_Amount
A1 1.2019 Badam 100
A2 1.2019 Badam 30
A1 1.2019 carrot 200
A3 2.2019 soap 3
Basically I have to add sales amount 1 and 2 and compare the results against target data. I would like to arrive a single csv with column details as below. Hence I can do the calculations as I mentioned.
Name Year Item sales_Amount1 Sales_Amount2 target_Amount
A1 1.2019 Badam 12 70 100
A1 1.2019 carrot 18 0 200
A2 1.2019 Badam 30 41 30
A1 2.2019 soap 1 10 0
A3 2.1019 soap 3 11 3
df1 = pd.read_excel(r"excel_1.xlxs")
sum_sales1 = df1.groupby(['Name','Year', 'Item']).agg({'sales_Amount1': 'sum'})
df2 = pd.read_excel(r"excel_2.xlxs")
sum_sales1 = df2.groupby(['Name','Year', 'Item']).agg({'sales_Amount2': 'sum'})
df3 = pd.read_excel(r"excel_3.xlxs")
sum_sales1 = df3.groupby(['Name','Year', 'Item']).agg({'target_Amount': 'sum'})
Basically I loaded each csv into one dataframe and then groupby to get the aggregated sum of each items as shown above. It's bit tricky now to compare and merge all three dataframe as shown above with the following columns
Name Year Item sales_Amount1 Sales_Amount2 target_Amount
Is this approach right towards comparing three dataframes and merge them as one or should I move towards pivot table. It's bit confusing which one to chose before I proceed. Thanks.
Upvotes: 1
Views: 111
Reputation: 862611
Use concat
with DataFrame.fillna
:
sum_sales1 = df1.groupby(['Name','Year', 'Item']).agg({'sales_Amount1': 'sum'})
sum_sales2 = df2.groupby(['Name','Year', 'Item']).agg({'sales_Amount2': 'sum'})
sum_sales3 = df3.groupby(['Name','Year', 'Item']).agg({'target_Amount': 'sum'})
df = (pd.concat([sum_sales1, sum_sales2, sum_sales3],
axis=1).fillna(0).astype(int).reset_index())
print (df)
Name Year Item sales_Amount1 sales_Amount2 target_Amount
0 A1 1.2019 Badam 12 70 100
1 A1 1.2019 carrot 18 0 200
2 A1 2.2019 soap 1 10 0
3 A2 1.2019 Badam 30 41 30
4 A3 2.2019 soap 3 11 3
If last column is necessary aggregate - then use list comprehension:
dfs = [df1, df2, df3]
dfs = [x.groupby(['Name','Year', 'Item']).agg({x.columns[-1]: 'sum'}) for x in dfs]
df = pd.concat(dfs, axis=1).fillna(0).astype(int).reset_index()
print (df)
Name Year Item sales_Amount1 sales_Amount2 target_Amount
0 A1 1.2019 Badam 12 70 100
1 A1 1.2019 carrot 18 0 200
2 A1 2.2019 soap 1 10 0
3 A2 1.2019 Badam 30 41 30
4 A3 2.2019 soap 3 11 3
Upvotes: 1