ista120
ista120

Reputation: 111

comapre two dataframe and add particular column value based on match

I have two dataframe like this:

DF1

product_name        cost  Account  Acc_no
prod1               200   NA-STG   1264509
prod2               500   NA-STG   1264509
prod3               800   NA-STG   1264509
prod5               300   NA-STG   1264509
prod3               804   NA-DEV   1264345
prod4               511   NA-DEV   1264345
prod5               308   NA-DEV   1264345

df2:

product_name        active_subscription distributed_cost  account_alias
prod1                   20              1299.498484       NA-STG
prod2                   50              3248.746211       NA-STG
prod3                   80              5197.993937       NA-STG
prod4                   80              97.993937         NA-STG
prod4                   32              1000              NA-DEV

I want add the distributed_cost from df2 into df1 based account.If any product in df2(ex:prod4 is missing in df1) is missing in df1 then corresponding records has to be added in df1.So the output I want is:

product_name        cost          Account
prod1               1499.498484   NA-STG
prod2               3748.746211   NA-STG
prod3               5997.993937   NA-STG
prod4               97.993937     NA-STG
prod5               300           NA-STG
prod3               804           NA-DEV
prod4               1511          NA-DEV
prod5               308           NA-DEV

How can I implement this in efficient way?

Upvotes: 2

Views: 75

Answers (2)

sophocles
sophocles

Reputation: 13821

You can use pd.concat() with a groupby to get your result, as below:

pd.concat([
     df1.drop('Acc_no',axis=1),
     (df2.drop('active_subscription',axis=1)).rename({'distributed_cost':'cost','account_alias':'Account'},axis=1)]) \
    .groupby(['product_name','Account'])['cost'].sum() \
    .reset_index() \
    .sort_values(['Account','product_name'],ascending=[False,True]
)

which prints:

  product_name Account         cost
0        prod1  NA-STG  1499.498484
1        prod2  NA-STG  3748.746211
3        prod3  NA-STG  5997.993937
5        prod4  NA-STG    97.993937
7        prod5  NA-STG   300.000000
2        prod3  NA-DEV   804.000000
4        prod4  NA-DEV  1511.000000
6        prod5  NA-DEV   308.000000

Upvotes: 1

Ank
Ank

Reputation: 1704

Do below steps to achieve it:

  1. Perform outer merge on your dataframes using product_names and account respective columns in both DFs.
  2. Fill the mising values in account column (Since some values are in DF2 but not in DF1).
  3. Add the cost and distributed_cost values.
  4. Finally select the columns you want to get your final dataframe.

Code:

df = df1.merge(df2, right_on=['product_name','account_alias'], left_on=['product_name','Account'], how='outer')
df['Account'] = df['Account'].fillna(df['account_alias'])
df['cost'] = df['cost'].fillna(0) + df['distributed_cost'].fillna(0)
df = df.loc[:, ['product_name','cost', 'Account']]

Output:

>>> df.sort_values(['Account', 'product_name'], ascending=[False,True])
  product_name         cost Account
0        prod1  1499.498484  NA-STG
1        prod2  3748.746211  NA-STG
2        prod3  5997.993937  NA-STG
7        prod4    97.993937  NA-STG
3        prod5   300.000000  NA-STG
4        prod3   804.000000  NA-DEV
5        prod4  1511.000000  NA-DEV
6        prod5   308.000000  NA-DEV

Upvotes: 1

Related Questions