rahlf23
rahlf23

Reputation: 9019

Preserving columns in output after performing sum on groupby

Given a sample df:

df = pd.DataFrame([['William', 1, 0, 'T', 0, 1],['James', 0, 1, 'R', 1, 1],['James', 1, 0, 'S', 0, 1],['Dean', 1, 0, 'R', 1, 0],['William', 0, 1, 'S', 0, 0],['James', 0, 0, 'S', 0, 1]],columns=['Name','x1','x2','x3','x4','x5']) 

      Name  x1  x2 x3  x4  x5
0  William   1   0  T   0   1
1    James   0   1  R   1   1
2    James   1   0  S   0   1
3     Dean   1   0  R   1   0
4  William   0   1  S   0   0
5    James   0   0  S   0   1

I had asked a question previously as to how to apply various filters to this df and output the results for a series of functions applied to each group object from a groupby, and I arrived at the following solution:

variables = {'x1': 'sum','x2': 'sum','x4': 'sum','x5': 'sum'}
filters = {'Option1': df['x3']=='S', 'Option2': df['x3']=='R', 'Option3': (df['x2']==1) | (df['x4']==1) | (df['x5']==1), 'Option4': df['x2']==1, 'Option5': df['x2']==0, 'Option6': df['x5']==1}

out = {key: df[f].groupby('Name').agg(variables) for key, f in filters.items()}

out = pd.concat(results)

After concatenating the results, I'm left with the following:

                 x1  x2  x4  x5
        Name                   
Option1 James     1   0   0   2
        William   0   1   0   0
Option2 Dean      1   0   1   0
        James     0   1   1   1
Option3 Dean      1   0   1   0
        James     1   1   1   3
        William   1   1   0   1
Option4 James     0   1   1   1
        William   0   1   0   0
Option5 Dean      1   0   1   0
        James     1   0   0   2
        William   1   0   0   1
Option6 James     1   1   1   3
        William   1   0   0   1

I want to again groupby('Name'), which gives me:

              x1  x2  x4  x5
        Name                
Option2 Dean   1   0   1   0
Option3 Dean   1   0   1   0
Option5 Dean   1   0   1   0 


               x1  x2  x4  x5
        Name                 
Option1 James   1   0   0   2
Option2 James   0   1   1   1
Option3 James   1   1   1   3
Option4 James   0   1   1   1
Option5 James   1   0   0   2
Option6 James   1   1   1   3 


                 x1  x2  x4  x5
        Name                   
Option1 William   0   1   0   0
Option3 William   1   1   0   1
Option4 William   0   1   0   0
Option5 William   1   0   0   1
Option6 William   1   0   0   1 

However I have columns (or rows, depending on how you look at it) that are being left out from the results (e.g. the filter df['x3']=='S' will leave the Name column with no instances of 'Dean'). I feel like I am really close here, but this is my desired output (the sorting of the names is not relevant):

                  x1  x2  x4  x5
Name                   
James   Option1   1   0   0   2
        Option2   0   1   1   1
        Option3   1   1   1   3
        Option4   0   1   1   1
        Option5   1   0   0   2
        Option6   1   1   1   3
Dean    Option1   0   0   0   0
        Option2   1   0   1   0
        Option3   1   0   1   0
        Option4   0   0   0   0
        Option5   1   0   1   0
        Option6   0   0   0   0
William Option1   0   1   0   0
        Option2   0   0   0   0
        Option3   1   1   0   1
        Option4   0   1   0   0
        Option5   1   0   0   1
        Option6   1   0   0   1

Thank you for any pointers.

Upvotes: 4

Views: 65

Answers (1)

ALollz
ALollz

Reputation: 59519

You can accomplish what you want by reindexing your out DataFrame and swapping the levels of the index. Starting from the result of your concatenation:

from itertools import product

# Swap the index levels
out = out.swaplevel(0,1)

# Form the product of the two index levels
ids = list(product(out.index.get_level_values(0).unique(), 
                   out.index.get_level_values(1).unique()))

# Reindex out, filling missing with 0 and sorting the index
out = out.reindex(ids).fillna(0).sort_index().astype('int')

out is now:

                 x1  x2  x4  x5
Name                           
Dean    Option1   0   0   0   0
        Option2   1   0   1   0
        Option3   1   0   1   0
        Option4   0   0   0   0
        Option5   1   0   1   0
        Option6   0   0   0   0
James   Option1   1   0   0   2
        Option2   0   1   1   1
        Option3   1   1   1   3
        Option4   0   1   1   1
        Option5   1   0   0   2
        Option6   1   1   1   3
William Option1   0   1   0   0
        Option2   0   0   0   0
        Option3   1   1   0   1
        Option4   0   1   0   0
        Option5   1   0   0   1
        Option6   1   0   0   1

Upvotes: 4

Related Questions