James Upson
James Upson

Reputation: 65

Self creating columns based on value in another

This is very similar to the question i asked yesterday. The aim is to be able to add a functionality which will allow for a column to be created depending on the value shown in another. For example, when it finds a country code in a specified file, i would like it to create a column with the name 'Country Code Total', and sum the amount of units for every row with that same country code

This is what my script outputs at the moment:

Script Output

What i want to see:

Goal

My Script:

df['Sum of Revenue'] = df['Units Sold'] * df['Dealer Price']
    df['AR Revenue'] = df[]
    df = df.sort_values(['End Consumer Country', 'Currency Code'])
    # Sets first value of index by position
    df.loc[df.index[0], 'Unit Total'] = df['Units Sold'].sum()
    # Sets first value of index by position
    df.loc[df.index[0], 'Total Revenue'] = df['Sum of Revenue'].sum()
    # Sums the amout of Units with the End Consumer Country AR
    df['AR Total'] = df.loc[df['End Consumer Country'] == 'AR', 'Units Sold'].sum()
    # Sums the amount of Units with the End Consumer Country AU
    df['AU Total'] = df.loc[df['End Consumer Country'] == 'AU', 'Units Sold'].sum()
    # Sums the amount of Units with the End Consumer Country NZ
    df['NZ Total'] = df.loc[df['End Consumer Country'] == 'NZ', 'Units Sold'].sum()

However, as i know the countries that will come up in this file, i have added them accordingly to my script to find. How would i write my script so that if it finds another country code, for example GB, it would create a column called 'GB Total' and sum the units for every row with the country code set to GB.

Any help would be greatly appreciated!

Upvotes: 0

Views: 73

Answers (1)

ALollz
ALollz

Reputation: 59579

If you truly need that format, then here is how I would proceed (starting data below):

# Get those first two columns
d = {'Sum of Revenue': 'Total Revenue', 'Units Sold': 'Total Sold'}
for col, newcol in d.items():
    df.loc[df.index[0], newcol] = df[col].sum()

# Add the rest for every country:
s = df.groupby('End Consumer Country')['Units Sold'].sum().to_frame().T.add_suffix(' Total')
s.index = [df.index[0]]

df  = pd.concat([df, s], 1, sort=False)

Output: df:

  End Consumer Country  Sum of Revenue  Units Sold  Total Revenue  Total Sold  AR Total  AU Total  NZ Total  US Total
a                   AR       13.486216           1     124.007334        28.0       3.0       7.0      11.0       7.0
b                   AR       25.984073           2            NaN         NaN       NaN       NaN       NaN       NaN
c                   AU       21.697871           3            NaN         NaN       NaN       NaN       NaN       NaN
d                   AU       10.962232           4            NaN         NaN       NaN       NaN       NaN       NaN
e                   NZ       16.528398           5            NaN         NaN       NaN       NaN       NaN       NaN
f                   NZ       29.908619           6            NaN         NaN       NaN       NaN       NaN       NaN
g                   US        5.439925           7            NaN         NaN       NaN       NaN       NaN       NaN

As you can see, pandas added a bunch of NaN values as we only assigned something to the first row, and a DataFrame must be rectangular


It's far simpler to have a different DataFrame that summarizes the totals and within each country. If this is fine, then everything simplifies to a single .pivot_table

df.pivot_table(index='End Consumer Country', 
               values=['Sum of Revenue', 'Units Sold'],
               margins=True,
               aggfunc='sum').T.add_suffix(' Total)

Output:

End Consumer Country   AR Total   AU Total   NZ Total  US Total   All Total
Sum of Revenue        39.470289  32.660103  46.437018  5.439925  124.007334
Units Sold             3.000000   7.000000  11.000000  7.000000   28.000000

Same information, much simpler to code.


Sample data:

import pandas as pd
import numpy as np

np.random.seed(123)
df = pd.DataFrame({'End Consumer Country': ['AR', 'AR', 'AU', 'AU', 'NZ', 'NZ', 'US'],
                   'Sum of Revenue': np.random.normal(20,6,7),
                   'Units Sold': np.arange(1,8,1)},
                   index = list('abcdefg'))

  End Consumer Country  Sum of Revenue  Units Sold
a                   AR       13.486216           1
b                   AR       25.984073           2
c                   AU       21.697871           3
d                   AU       10.962232           4
e                   NZ       16.528398           5
f                   NZ       29.908619           6
g                   US        5.439925           7

Upvotes: 1

Related Questions