Reputation: 3010
Given data structured as follows
from io import StringIO
import pandas as pd
data = StringIO("""
person,q,a
1,q1,Yes
1,q2,No
1,q3,Yes
1,q1,No
1,q2,No
1,q3,Yes
2,q1,Yes
2,q2,Yes
2,q3,Yes
3,q1,No
3,q2,Yes
3,q3,Yes
3,q1,Yes
3,q2,No
3,q3,Yes""")
df = pd.read_csv(data)
I am looking for a table with the following output
person q1 q2 q3
1 Yes No Yes
1 No No Yes
2 Yes Yes Yes
3 No Yes Yes
3 Yes No Yes
I can accomplish what I want by looping through each group and pivoting on each group, then concatenating those results, but would like to avoid doing so, if possible. Any suggestions would be appreciated.
gb = df.groupby('person')
dfs = []
for gn, _ in gb.groups.items():
gdf = gb.get_group(gn).copy()
gdf['grp'] = gdf.groupby('q').cumcount()
piv = gdf.pivot(index='grp', columns='q', values='a')
piv['person'] = gn
dfs.append(piv)
Upvotes: 1
Views: 217
Reputation: 150785
This is similar to pivot by two columns:
(df.assign(idx=df.groupby(['person','q']).cumcount())
.pivot_table(index=['person','idx'],columns='q',values='a', aggfunc='first')
.reset_index('idx',drop=True)
.reset_index()
)
Or equivalently with set_index().unstack()
:
df.assign(idx=df.groupby(['person','q']).cumcount())
.set_index(['person','idx','q'])['a']
.unstack(['q'])
.reset_index('idx',drop=True)
.reset_index()
)
Output:
q person q1 q2 q3
0 1 Yes No Yes
1 1 No No Yes
2 2 Yes Yes Yes
3 3 No Yes Yes
4 3 Yes No Yes
Upvotes: 1