Eric Truett
Eric Truett

Reputation: 3010

Pandas pivot to one row per subgroup

Given data structured as follows

from io import StringIO
import pandas as pd

data = StringIO("""
person,q,a
1,q1,Yes
1,q2,No
1,q3,Yes
1,q1,No
1,q2,No
1,q3,Yes
2,q1,Yes
2,q2,Yes
2,q3,Yes
3,q1,No
3,q2,Yes
3,q3,Yes
3,q1,Yes
3,q2,No
3,q3,Yes""")

df = pd.read_csv(data)

I am looking for a table with the following output

person  q1  q2  q3
1   Yes No  Yes
1   No  No  Yes
2   Yes Yes Yes
3   No  Yes Yes
3   Yes No  Yes

I can accomplish what I want by looping through each group and pivoting on each group, then concatenating those results, but would like to avoid doing so, if possible. Any suggestions would be appreciated.

gb = df.groupby('person')
dfs = []
for gn, _ in gb.groups.items():
    gdf = gb.get_group(gn).copy()
    gdf['grp'] = gdf.groupby('q').cumcount()
    piv = gdf.pivot(index='grp', columns='q', values='a')
    piv['person'] = gn
    dfs.append(piv)

Upvotes: 1

Views: 217

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150785

This is similar to pivot by two columns:

(df.assign(idx=df.groupby(['person','q']).cumcount())
   .pivot_table(index=['person','idx'],columns='q',values='a', aggfunc='first')
   .reset_index('idx',drop=True)
   .reset_index()
)

Or equivalently with set_index().unstack():

df.assign(idx=df.groupby(['person','q']).cumcount())
   .set_index(['person','idx','q'])['a']
   .unstack(['q'])
   .reset_index('idx',drop=True)
   .reset_index()
)

Output:

q  person   q1   q2   q3
0       1  Yes   No  Yes
1       1   No   No  Yes
2       2  Yes  Yes  Yes
3       3   No  Yes  Yes
4       3  Yes   No  Yes

Upvotes: 1

Related Questions