SGC
SGC

Reputation: 91

How to subgroup products in the dataframe?

Current Dataframe:

PROD              Party     x      y            Currency    Parent  Product agg_x   agg_y
iphone 13 pro   5AAZ27778   60     9.927        USD         Apple   Cash    120     19.854
iphone 13       6AAL321054  30     20.24441848  USD         Apple   Cash    30      20.24441848
iphone 12       9G351866    70     3.747318956  USD         Apple   Cash    140     7.494637912
iphone 12       9G351866    70     3.747318956  USD         Apple   Cash    140     7.494637912
iphone se       3G140868    750    122.6481749  USD         Apple   Cash    750     122.6481749
iphone 13 pro   5AAZ27778   60     9.927        USD         Apple   Cash    120     19.854

Expected Dataframe:

Product            Party    x      y            Currency    Parent
Cash            5AAZ27778   120    19.854          USD       Apple
iphone 13 pro   5AAZ27778   60     9.927           USD       Apple
iphone 13 pro   5AAZ27778   60     9.927           USD       Apple
Cash            6AAL321054  30     20.24441848     USD       Apple
iphone 13       6AAL321054  30     20.24441848     USD       Apple
Cash            9G351866    140    7.494637912     USD       Apple
iphone 12       9G351866    70     3.747318956     USD       Apple
iphone 12       9G351866    70     3.747318956     USD       Apple
Cash            3G140868    750    122.6481749     USD       Apple
iphone se       3G140868    750    122.6481749     USD       Apple

enter image description here

enter image description here

I want to modify the current dataframe to the expected dataframe. In the expected data frame the the common records in column "party" should be place under the "Cash" which is the aggregate. In the current dataframe, column agg_x and agg_y are the result of the groupby based on the column "party" for column x and y. Can you please provide me the code to convert the currrent dataframe to expected dataframe. Appreciate your help in this !

Upvotes: 1

Views: 51

Answers (1)

mozway
mozway

Reputation: 261830

Given the absence of clean headers, the best might be to just split and concat, removing the duplicates in cash, then reorder:

cols1 = ['PROD', 'Party', 'x', 'y', 'Currency', 'Parent']
cols2 = ['Product', 'Party', 'agg_x', 'agg_y', 'Currency', 'Parent']

out = (pd.concat([df[cols2].set_axis(cols1, axis=1)
                           .drop_duplicates('Party'),
                  df[cols1]])
         .sort_values(by='Party')
      )

output:

            PROD       Party    x           y Currency Parent
4           Cash    3G140868  750  122.648175      USD  Apple
4      iphone se    3G140868  750  122.648175      USD  Apple
0           Cash   5AAZ27778  120   19.854000      USD  Apple
0  iphone 13 pro   5AAZ27778   60    9.927000      USD  Apple
5  iphone 13 pro   5AAZ27778   60    9.927000      USD  Apple
1           Cash  6AAL321054   30   20.244418      USD  Apple
1      iphone 13  6AAL321054   30   20.244418      USD  Apple
2           Cash    9G351866  140    7.494638      USD  Apple
2      iphone 12    9G351866   70    3.747319      USD  Apple
3      iphone 12    9G351866   70    3.747319      USD  Apple

Upvotes: 1

Related Questions