Pandas append multiple columns for a single one

Question

How can I use pandas to append multiple KPI values per single customer efficiently?

A join of the pivoted df with the customers df makes some problems because the country is the index of the pivoted data frame and the nationality is not in the index.

countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'],
                           'indicator':['z','x','z','x'],
                           'value':[7,8,9,7]})
customers = pd.DataFrame({'customer':['first','second'],
                           'nationality':['Germany','Austria'],
                           'value':[7,8]})

See the desired result in pink:

Nickil Maveli · Accepted Answer

You could counter the mismatch in the categories through merge:

df = pd.pivot_table(data=countryKPI, index=['country'], columns=['indicator'])
df.index.name = 'nationality'    
customers.merge(df['value'].reset_index(), on='nationality', how='outer')

Data:

countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'],
                           'indicator':['z','x','z','x'],
                           'value':[7,8,9,7]})
customers = pd.DataFrame({'customer':['first','second'],
                           'nationality':['Slovakia','Austria'],
                           'value':[7,8]})

The problem appears to be that you have got CategoricalIndex in your DF resulting from the pivot operation and when you perform reset_index on that complains you of that error.

Simply do reverse engineering as in check the dtypes of countryKPI and customers Dataframes and wherever there is category mentioned, convert those columns to their string representation via astype(str)

Reproducing the Error and Countering it:

Assume the DF to be the above mentioned:

countryKPI['indicator'] = countryKPI['indicator'].astype('category')
countryKPI['country'] = countryKPI['country'].astype('category')
customers['nationality'] = customers['nationality'].astype('category')

countryKPI.dtypes
country      category
indicator    category
value           int64
dtype: object

customers.dtypes
customer         object
nationality    category
value             int64
dtype: object

After pivot operation:

df = pd.pivot_table(data=countryKPI, index=['country'], columns=['indicator'])
df.index
CategoricalIndex(['Austria', 'Germany'], categories=['Austria', 'Germany'], ordered=False, 
                  name='country', dtype='category')
# ^^ See the categorical index

When you perform reset_index on that:

df.reset_index()

TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category

To counter that error, simply cast the categorical columns to str type.

countryKPI['indicator'] = countryKPI['indicator'].astype('str')
countryKPI['country'] = countryKPI['country'].astype('str')
customers['nationality'] = customers['nationality'].astype('str')

Now, the reset_index part works and even the merge too.

Pandas append multiple columns for a single one

Answers (2)

Related Questions