Reputation: 737
My sample data:
Division Region Director Month
0 1 A Smith JAN
1 1 B Jones JAN
2 2 C Williams FEB
3 3 D Johnson FEB
4 1 B Watkins MAR
The desired intermediate output is:
Division Region Director JAN FEB MAR
1 A Smith 1 0 0
1 B Jones 1 0 0
2 C Williams 0 1 0
3 D Johnson 0 1 0
1 B Watkins 0 0 1
The final output, ideally:
Division Region Director JAN FEB MAR
1 A Smith 1 0 0
B Jones 1 0 0
Watkins 0 0 1
2 C Williams 0 1 0
3 D Johnson 0 1 0
I have tried pivot, pivotable, unstack and various groupby combos found on SS and elsewhere. Nothing quite gets me to what I need in the intermediate or final version. Ultimately, I will be saving this to an excel file for the end user once I can get Pandas to help get it into the shape I need. Thanks for taking a moment to check it out.
Upvotes: 0
Views: 85
Reputation: 120429
Use pivot
and assign a dummy value to have a values
>>> df.assign(value=1) \
.pivot(index=['Division', 'Region', 'Director'],
columns='Month', values='value') \
.fillna(0).astype(int)[df['Month'].unique()] \
.rename_axis(columns=None).reset_index()
Division Region Director JAN FEB MAR
0 1 A Smith 1 0 0
1 1 B Jones 1 0 0
2 1 B Watkins 0 0 1
3 2 C Williams 0 1 0
4 3 D Johnson 0 1 0
Update
I get the following error (which one of my other attempts brought the same error message). Value error: index contains duplicate entries, cannot reshape
Use pivot_table
rather than pivot
:
>>> df.assign(value=1) \
.pivot_table(index=['Division', 'Region', 'Director'],
columns='Month', values='value', aggfunc='first') \
.fillna(0).astype(int)[df['Month'].unique()] \
.rename_axis(columns=None).reset_index()
Upvotes: 2
Reputation: 214977
Or use pd.get_dummies
:
pd.concat([df.drop('Month', 1), pd.get_dummies(df.Month)], axis=1)
Division Region Director FEB JAN MAR
0 1 A Smith 0 1 0
1 1 B Jones 0 1 0
2 2 C Williams 1 0 0
3 3 D Johnson 1 0 0
4 1 B Watkins 0 0 1
For the final output, it seems you just want to sort_values
:
(pd.concat([df.drop('Month', 1), pd.get_dummies(df.Month)], axis=1)
.sort_values(['Division', 'Region']))
Division Region Director FEB JAN MAR
0 1 A Smith 0 1 0
1 1 B Jones 0 1 0
4 1 B Watkins 0 0 1
2 2 C Williams 1 0 0
3 3 D Johnson 1 0 0
Upvotes: 2