Reputation: 233
I have data that looks like this:
Box,Code
Green,1221
Green,8391
Red,3709
Red,2911
Blue,9820
Blue,4530
Using a pandas dataframe, I'm wondering if it is possible to output something like this:
Box,Code1,Code2
Green,1221,8391
Red,3709,2911
Blue,9820,4530
My data always has an equal number of rows per 'Box'.
I've been experimenting with pivots and crosstabs (as well as stack and unstack) in pandas but haven't found anything that gets me to the 'broaden' result I'm looking for.
Upvotes: 1
Views: 68
Reputation: 863166
You can use groupby
for list
s and then DataFrame
constructor:
a = df.groupby('Box')['Code'].apply(list)
df = pd.DataFrame(a.values.tolist(), index=a.index).add_prefix('Code').reset_index()
print (df)
Box Code0 Code1
0 Blue 9820 4530
1 Green 1221 8391
2 Red 3709 2911
Or cumcount
for new Series
with pandas.pivot
:
g = df.groupby('Box').cumcount()
df = pd.pivot(index=df['Box'], columns=g, values=df['Code']).add_prefix('Code').reset_index()
print (df)
Box Code0 Code1
0 Blue 9820 4530
1 Green 1221 8391
2 Red 3709 2911
And similar solution with unstack
:
df['g'] = df.groupby('Box').cumcount()
df = df.set_index(['Box', 'g'])['Code'].unstack().add_prefix('Code').reset_index()
print (df)
g Box Code0 Code1
0 Blue 9820 4530
1 Green 1221 8391
2 Red 3709 2911
Upvotes: 4