John Taylor
John Taylor

Reputation: 737

Pandas Pivot, Unstack or otherwise transform column to headers with proper count

My sample data:

          Division    Region     Director    Month
   0      1           A          Smith       JAN
   1      1           B          Jones       JAN
   2      2           C          Williams    FEB
   3      3           D          Johnson     FEB
   4      1           B          Watkins     MAR

The desired intermediate output is:

Division    Region     Director    JAN   FEB   MAR
1           A          Smith       1      0     0
1           B          Jones       1      0     0
2           C          Williams    0      1     0
3           D          Johnson     0      1     0
1           B          Watkins     0      0     1

The final output, ideally:

Division    Region     Director    JAN   FEB   MAR
1           A          Smith       1      0     0
            B          Jones       1      0     0
                       Watkins     0      0     1
2           C          Williams    0      1     0
3           D          Johnson     0      1     0

I have tried pivot, pivotable, unstack and various groupby combos found on SS and elsewhere. Nothing quite gets me to what I need in the intermediate or final version. Ultimately, I will be saving this to an excel file for the end user once I can get Pandas to help get it into the shape I need. Thanks for taking a moment to check it out.

Upvotes: 0

Views: 85

Answers (2)

Corralien
Corralien

Reputation: 120429

Use pivot and assign a dummy value to have a values

>>> df.assign(value=1) \
      .pivot(index=['Division', 'Region', 'Director'], 
             columns='Month', values='value') \
      .fillna(0).astype(int)[df['Month'].unique()] \
      .rename_axis(columns=None).reset_index()

   Division Region  Director  JAN  FEB  MAR
0         1      A     Smith    1    0    0
1         1      B     Jones    1    0    0
2         1      B   Watkins    0    0    1
3         2      C  Williams    0    1    0
4         3      D   Johnson    0    1    0

Update

I get the following error (which one of my other attempts brought the same error message). Value error: index contains duplicate entries, cannot reshape

Use pivot_table rather than pivot:

>>> df.assign(value=1) \
      .pivot_table(index=['Division', 'Region', 'Director'],
                   columns='Month', values='value', aggfunc='first') \
      .fillna(0).astype(int)[df['Month'].unique()] \
      .rename_axis(columns=None).reset_index()

Upvotes: 2

akuiper
akuiper

Reputation: 214977

Or use pd.get_dummies:

pd.concat([df.drop('Month', 1), pd.get_dummies(df.Month)], axis=1)

   Division Region  Director  FEB  JAN  MAR
0         1      A     Smith    0    1    0
1         1      B     Jones    0    1    0
2         2      C  Williams    1    0    0
3         3      D   Johnson    1    0    0
4         1      B   Watkins    0    0    1

For the final output, it seems you just want to sort_values:

(pd.concat([df.drop('Month', 1), pd.get_dummies(df.Month)], axis=1)
   .sort_values(['Division', 'Region']))

   Division Region  Director  FEB  JAN  MAR
0         1      A     Smith    0    1    0
1         1      B     Jones    0    1    0
4         1      B   Watkins    0    0    1
2         2      C  Williams    1    0    0
3         3      D   Johnson    1    0    0

Upvotes: 2

Related Questions