Reputation: 4651
I have the following data frame:
df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3','col4']
When I group it I get:
df.groupby(['col4','col2']).sum()
col1 col3
col4 col2
1 A 1.1 1.1
D 5.5 14.2
2 B 6.0 6.4
3 A 4.8 8.4
C 3.4 4.5
4 B 2.5 3.3
5 B 5.2 8.6
6 B 3.4 4.3
However what I would like to have is the same second level index for each first level index. This is not possible because of missing data. What I would like to achieve is:
col1 col3
col4 col2
1 A 1.1 1.1
B 0 0
C 0 0
D 5.5 14.2
2 A 0 0
B 6.0 6.4
C 0 0
D 0 0
3 A 4.8 8.4
B 0 0
C 3.4 4.5
D 0 0
4 A 0 0
B 2.5 3.3
C 0 0
D 0 0
5 A 0 0
B 5.2 8.6
C 0 0
D 0 0
6 A 0 0
B 3.4 4.3
C 0 0
D 0 0
Upvotes: 0
Views: 1210
Reputation: 176820
To account for the missing indices, you'll probably need to reindex the DataFrame returned by df.groupby(['col4','col2']).sum()
:
>>> m = pd.MultiIndex.from_product([df.col4.unique(), df.col2.unique()])
>>> df.groupby(['col4','col2']).sum().reindex(m, fill_value=0)
col1 col3
1 A 1.1 1.1
B 0.0 0.0
C 0.0 0.0
D 5.5 14.2
3 A 4.8 8.4
B 0.0 0.0
C 3.4 4.5
D 0.0 0.0
2 A 0.0 0.0
B 6.0 6.4
C 0.0 0.0
D 0.0 0.0
4 A 0.0 0.0
B 2.5 3.3
C 0.0 0.0
D 0.0 0.0
5 A 0.0 0.0
B 5.2 8.6
C 0.0 0.0
D 0.0 0.0
6 A 0.0 0.0
B 3.4 4.3
C 0.0 0.0
D 0.0 0.0
Note that unique()
returns the values in the order they appear in the column (e.g. '3' appears before '2' here). You could correct this by using .sortlevel(0)
on the reindexed DataFrame.
Upvotes: 2
Reputation: 52246
You could create a new MultiIndex
from your data, and reindex
, like this.
In [6]: idx = pd.MultiIndex.from_product([df.col4.unique(), df.col2.unique()])
In [9]: (df.groupby(['col4','col2']).sum()
.reindex(idx).fillna(0))
Out[9]:
col1 col3
1 A 1.1 1.1
B 0.0 0.0
C 0.0 0.0
D 5.5 14.2
3 A 4.8 8.4
B 0.0 0.0
C 3.4 4.5
D 0.0 0.0
2 A 0.0 0.0
B 6.0 6.4
C 0.0 0.0
D 0.0 0.0
4 A 0.0 0.0
B 2.5 3.3
C 0.0 0.0
D 0.0 0.0
5 A 0.0 0.0
B 5.2 8.6
C 0.0 0.0
D 0.0 0.0
6 A 0.0 0.0
B 3.4 4.3
C 0.0 0.0
D 0.0 0.0
Upvotes: 3