Reputation: 9019
Given a sample df
:
df = pd.DataFrame([['William', 1, 0, 'T', 0, 1],['James', 0, 1, 'R', 1, 1],['James', 1, 0, 'S', 0, 1],['Dean', 1, 0, 'R', 1, 0],['William', 0, 1, 'S', 0, 0],['James', 0, 0, 'S', 0, 1]],columns=['Name','x1','x2','x3','x4','x5'])
Name x1 x2 x3 x4 x5
0 William 1 0 T 0 1
1 James 0 1 R 1 1
2 James 1 0 S 0 1
3 Dean 1 0 R 1 0
4 William 0 1 S 0 0
5 James 0 0 S 0 1
I had asked a question previously as to how to apply various filters to this df
and output the results for a series of functions applied to each group object from a groupby
, and I arrived at the following solution:
variables = {'x1': 'sum','x2': 'sum','x4': 'sum','x5': 'sum'}
filters = {'Option1': df['x3']=='S', 'Option2': df['x3']=='R', 'Option3': (df['x2']==1) | (df['x4']==1) | (df['x5']==1), 'Option4': df['x2']==1, 'Option5': df['x2']==0, 'Option6': df['x5']==1}
out = {key: df[f].groupby('Name').agg(variables) for key, f in filters.items()}
out = pd.concat(results)
After concatenating the results, I'm left with the following:
x1 x2 x4 x5
Name
Option1 James 1 0 0 2
William 0 1 0 0
Option2 Dean 1 0 1 0
James 0 1 1 1
Option3 Dean 1 0 1 0
James 1 1 1 3
William 1 1 0 1
Option4 James 0 1 1 1
William 0 1 0 0
Option5 Dean 1 0 1 0
James 1 0 0 2
William 1 0 0 1
Option6 James 1 1 1 3
William 1 0 0 1
I want to again groupby('Name')
, which gives me:
x1 x2 x4 x5
Name
Option2 Dean 1 0 1 0
Option3 Dean 1 0 1 0
Option5 Dean 1 0 1 0
x1 x2 x4 x5
Name
Option1 James 1 0 0 2
Option2 James 0 1 1 1
Option3 James 1 1 1 3
Option4 James 0 1 1 1
Option5 James 1 0 0 2
Option6 James 1 1 1 3
x1 x2 x4 x5
Name
Option1 William 0 1 0 0
Option3 William 1 1 0 1
Option4 William 0 1 0 0
Option5 William 1 0 0 1
Option6 William 1 0 0 1
However I have columns (or rows, depending on how you look at it) that are being left out from the results (e.g. the filter df['x3']=='S'
will leave the Name
column with no instances of 'Dean'
). I feel like I am really close here, but this is my desired output (the sorting of the names is not relevant):
x1 x2 x4 x5
Name
James Option1 1 0 0 2
Option2 0 1 1 1
Option3 1 1 1 3
Option4 0 1 1 1
Option5 1 0 0 2
Option6 1 1 1 3
Dean Option1 0 0 0 0
Option2 1 0 1 0
Option3 1 0 1 0
Option4 0 0 0 0
Option5 1 0 1 0
Option6 0 0 0 0
William Option1 0 1 0 0
Option2 0 0 0 0
Option3 1 1 0 1
Option4 0 1 0 0
Option5 1 0 0 1
Option6 1 0 0 1
Thank you for any pointers.
Upvotes: 4
Views: 65
Reputation: 59519
You can accomplish what you want by reindexing your out
DataFrame and swapping the levels of the index. Starting from the result of your concatenation:
from itertools import product
# Swap the index levels
out = out.swaplevel(0,1)
# Form the product of the two index levels
ids = list(product(out.index.get_level_values(0).unique(),
out.index.get_level_values(1).unique()))
# Reindex out, filling missing with 0 and sorting the index
out = out.reindex(ids).fillna(0).sort_index().astype('int')
out
is now:
x1 x2 x4 x5
Name
Dean Option1 0 0 0 0
Option2 1 0 1 0
Option3 1 0 1 0
Option4 0 0 0 0
Option5 1 0 1 0
Option6 0 0 0 0
James Option1 1 0 0 2
Option2 0 1 1 1
Option3 1 1 1 3
Option4 0 1 1 1
Option5 1 0 0 2
Option6 1 1 1 3
William Option1 0 1 0 0
Option2 0 0 0 0
Option3 1 1 0 1
Option4 0 1 0 0
Option5 1 0 0 1
Option6 1 0 0 1
Upvotes: 4