Reputation: 125
I'm trying to use pd.pivot for the first time and struggling how to write it correctly. I have the following dataframe
siteid contactid name add
a01 Mr1 Abe rand1
a01 Mr2 Sam rand2
a02 Ms1 Ann rand3
a03 Mr2 Amy rand2
a03 Ms2 Ann rand3
I want to flatten this so that I have a single row for each siteid as follows.
siteid contactid_1 name_1 add_1 contactid_2 name_2 add_2 contactid_3 name_3 add_3
a01 Mr1 Abe rand1 Mr2 Sam rand2
a02 Ms1 Ann rand3
a03 Mr2 Amy rand2 Ms2 Ann rand3 Ms5 Dick rand4
I don't know how many contacts there may be per site (though don't think it will be more than 6), so need to allow for more columns.
I'm not sure if pivot is the correct way to do this, as when I tried it, it wants to aggregate the data...
Upvotes: 0
Views: 2407
Reputation: 35145
First, we create a cumulative value for each ID. This will be the column number to be expanded horizontally. Next, transform it with pd.pivot_table()
. We create a new column name and update the existing column name.
import pandas as pd
import numpy as np
import io
data = '''
siteid contactid name add
a01 Mr1 Abe rand1
a01 Mr2 Sam rand2
a02 Ms1 Ann rand3
a03 Mr2 Amy rand2
a03 Ms2 Ann rand3
'''
df = pd.read_csv(io.StringIO(data), sep='\s+')
df['flg'] = 1
df['flg'] = df.groupby('siteid')['flg'].transform(pd.Series.cumsum)
df2 = pd.pivot_table(df, index=['siteid'], values=['contactid','name','add'], columns=['flg'], fill_value='', aggfunc=lambda x: x)
new_cols = ['{}_{}'.format(x,y) for x,y in df2.columns]
df2.columns = new_cols
df2.reset_index()
| siteid | add_1 | add_2 | contactid_1 | contactid_2 | name_1 | name_2 |
|:---------|:--------|:--------|:--------------|:--------------|:---------|:---------|
| a01 | rand1 | rand2 | Mr1 | Mr2 | Abe | Sam |
| a02 | rand3 | | Ms1 | | Ann | |
| a03 | rand2 | rand3 | Mr2 | Ms2 | Amy | Ann |
Upvotes: 3