Broaden pandas dataframe

Question

I have data that looks like this:

Box,Code
Green,1221
Green,8391
Red,3709
Red,2911
Blue,9820
Blue,4530

Using a pandas dataframe, I'm wondering if it is possible to output something like this:

Box,Code1,Code2
Green,1221,8391
Red,3709,2911
Blue,9820,4530

My data always has an equal number of rows per 'Box'.

I've been experimenting with pivots and crosstabs (as well as stack and unstack) in pandas but haven't found anything that gets me to the 'broaden' result I'm looking for.

jezrael · Accepted Answer

You can use groupby for lists and then DataFrame constructor:

a = df.groupby('Box')['Code'].apply(list)
df = pd.DataFrame(a.values.tolist(), index=a.index).add_prefix('Code').reset_index()
print (df)
     Box  Code0  Code1
0   Blue   9820   4530
1  Green   1221   8391
2    Red   3709   2911

Or cumcount for new Series with pandas.pivot:

g = df.groupby('Box').cumcount()
df = pd.pivot(index=df['Box'], columns=g, values=df['Code']).add_prefix('Code').reset_index()
print (df)
     Box  Code0  Code1
0   Blue   9820   4530
1  Green   1221   8391
2    Red   3709   2911

And similar solution with unstack:

df['g'] = df.groupby('Box').cumcount()
df = df.set_index(['Box', 'g'])['Code'].unstack().add_prefix('Code').reset_index()
print (df)
g    Box  Code0  Code1
0   Blue   9820   4530
1  Green   1221   8391
2    Red   3709   2911

Broaden pandas dataframe

Answers (1)

Related Questions