Lilly
Lilly

Reputation: 988

Compare column values in two or three dataframe and merge

I have already checked few earlier questions and I have some what unique problem. I have three excel file and I load them into three different dataframe. Basically I have to add contents of excel_1 and excel_2 and compare the contents against excel_3

Example data: (excel_1 sales Territory#1)

  Name  Year    Item    sales_Amount1
  A1        1.2019  Badam   2
  A1        1.2019  Badam   10
  A1        1.2019  carrot  8
  A1        1.2019  carrot  10
  A2        1.2019  Badam   10
  A2        1.2019  Badam   20
  A3        2.2019  soap    3
  A1        2.2019  soap    1

Example data: (excel_2 sales Territory#2)

  Name  Year    Item    sales_Amount2
  A1        1.2019  Badam   60
  A1        1.2019  Badam   10
  A2        1.2019  Badam   40
  A2        1.2019  Badam   1
  A3        2.2019  soap    1
  A3        2.2019  soap    10
  A1        2.2019  soap    10

excel_3 target also has similar data

  Name  Year    Item    target_Amount
  A1        1.2019  Badam   100
  A2        1.2019  Badam   30
  A1        1.2019  carrot  200
  A3        2.2019  soap    3

Basically I have to add sales amount 1 and 2 and compare the results against target data. I would like to arrive a single csv with column details as below. Hence I can do the calculations as I mentioned.

  Name   Year    Item   sales_Amount1  Sales_Amount2  target_Amount
  A1     1.2019  Badam  12              70             100
  A1     1.2019  carrot 18              0              200
  A2     1.2019  Badam  30              41             30
  A1     2.2019  soap   1               10             0
  A3     2.1019  soap   3               11             3
df1 = pd.read_excel(r"excel_1.xlxs")
sum_sales1 = df1.groupby(['Name','Year', 'Item']).agg({'sales_Amount1': 'sum'})

df2 = pd.read_excel(r"excel_2.xlxs")
sum_sales1 = df2.groupby(['Name','Year', 'Item']).agg({'sales_Amount2': 'sum'})

df3 = pd.read_excel(r"excel_3.xlxs")
sum_sales1 = df3.groupby(['Name','Year', 'Item']).agg({'target_Amount': 'sum'})

Basically I loaded each csv into one dataframe and then groupby to get the aggregated sum of each items as shown above. It's bit tricky now to compare and merge all three dataframe as shown above with the following columns

Name   Year    Item   sales_Amount1  Sales_Amount2  target_Amount

Is this approach right towards comparing three dataframes and merge them as one or should I move towards pivot table. It's bit confusing which one to chose before I proceed. Thanks.

Upvotes: 1

Views: 111

Answers (1)

jezrael
jezrael

Reputation: 862611

Use concat with DataFrame.fillna:

sum_sales1 = df1.groupby(['Name','Year', 'Item']).agg({'sales_Amount1': 'sum'})
sum_sales2 = df2.groupby(['Name','Year', 'Item']).agg({'sales_Amount2': 'sum'})
sum_sales3 = df3.groupby(['Name','Year', 'Item']).agg({'target_Amount': 'sum'})

df = (pd.concat([sum_sales1, sum_sales2, sum_sales3],
        axis=1).fillna(0).astype(int).reset_index())
print (df)
  Name    Year    Item  sales_Amount1  sales_Amount2  target_Amount
0   A1  1.2019   Badam             12             70            100
1   A1  1.2019  carrot             18              0            200
2   A1  2.2019    soap              1             10              0
3   A2  1.2019   Badam             30             41             30
4   A3  2.2019    soap              3             11              3

If last column is necessary aggregate - then use list comprehension:

dfs = [df1, df2, df3]
dfs = [x.groupby(['Name','Year', 'Item']).agg({x.columns[-1]: 'sum'}) for x in dfs]

df = pd.concat(dfs, axis=1).fillna(0).astype(int).reset_index()
print (df)
  Name    Year    Item  sales_Amount1  sales_Amount2  target_Amount
0   A1  1.2019   Badam             12             70            100
1   A1  1.2019  carrot             18              0            200
2   A1  2.2019    soap              1             10              0
3   A2  1.2019   Badam             30             41             30
4   A3  2.2019    soap              3             11              3

Upvotes: 1

Related Questions