NsN
NsN

Reputation: 109

Python - Transpose/Pivot a column based based on a different column

I searched it and indeed I found a lot of similar questions but none of those seemed to answer my case.

I have a pd Dataframe which is a joined table consist of products and the countries in which they are sold. It's 3000 rows and 50 columns in size.

I'm uploading a photo (only part of the df) of the current situation I'm in now and the expected result I want to achieve.

enter image description here

I want to transpose the 'Country name' column into rows grouped by the 'Product code name. Please note that the new country columns are not limited to a certain amount of countries (some products has 3, some 40).

Thank you!

Upvotes: 2

Views: 90

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Try this:

df_out = df.set_index(['Product code', 
                       'Product name', 
                       df.groupby('Product code').cumcount() + 1]).unstack()

df_out.columns = [f'Country_{j}' for _, j in df_out.columns]
df_out.reset_index()

Output:

  Product code Product name    Country_1 Country_2 Country_3
0       AAA115            Y       Sweden     China       NaN
1       AAA117            Z        Egypt    Greece       NaN
2       AAA118            W       France     Italy       NaN
3       AAA123            X  Netherlands    Poland     Spain

Details:

Reshape dataframe with set_index and unstack, using cumcount to create country columns. Then flatten multiindex header using list comprehension.

Upvotes: 1

Sander van den Oord
Sander van den Oord

Reputation: 12808

Use .cumcount() to count the number of countries that a product has.
Then use .pivot() to get your dataframe in the right shape:

df = pd.DataFrame({
    'Country': ['NL', 'Poland', 'Spain', 'Sweden', 'China', 'Egypt'],
    'Product Code': ['123', '123', '115', '115', '117', '118'],
    'Product Name': ['X', 'X', 'Y', 'Y', 'Z', 'W'],
})

df['cumcount'] = df.groupby(['Product Code', 'Product Name'])['Country'].cumcount() + 1

df_pivot = df.pivot(
    index=['Product Code', 'Product Name'], 
    columns='cumcount', 
    values='Country',
).add_prefix('country_')

Resulting dataframe:

    cumcount             country_1  country_2
ProductCode Product Name        
115         Y            Spain      Sweden
117         Z            China      NaN
118         W            Egypt      NaN
123         X            NL         Poland

Upvotes: 2

Related Questions