Reputation: 96304
Say I have a multi-index dataframe in Pandas, e.g:
A B C
X Y Z
bar one a -0.007381 -0.365315 -0.024817
b -1.219794 0.370955 -0.795125
baz three a 0.145578 1.428502 -0.408384
b -0.249321 -0.292967 -1.849202
two a -0.249321 -0.292967 -1.849202
four a 0.211234 -0.967123 1.202234
foo one b -1.046479 -1.250595 0.781722
a 1.314373 0.333150 0.133331
qux one c 0.716789 0.616471 -0.298493
two b 0.385795 -0.915417 -1.367644
How can I count how many levels are contained within another level? (e.g. level Y
within X
)
E.g. in the case above the answer would be:
X Y
bar 1
baz 3
foo 1
qux 2
When I try df.groupby(level=[0, 1]).count()[0]
I get:
C D E
A B
bar one 1 1 1
three 1 1 1
flux six 1 1 1
three 1 1 1
foo five 1 1 1
one 1 1 1
two 2 2 2
Upvotes: 27
Views: 23415
Reputation: 773
I think this must work as well:
For level A:
df.groupby(level='A').size()
For level B:
df.groupby(level=['A','B']).size()
Upvotes: 10
Reputation: 64
You can always add suffix to your column name and reset index after converting to dataframe.
Let's say I have pandas.series.Series object "s"
>> s = train.groupby('column_name').item_id.value_counts()
>> type(s)
pandas.core.series.Series
>> y = x.to_frame()
>> data = y.add_suffix('_Count').reset_index()
>> data.head() #It will be pandas dataframe with column updates with suffix "_Count"
I converted multi index series object to single level indexed dataframe.
Upvotes: 2
Reputation: 139172
You can do the following (group by level X
and then calculate the number of unique values of Y
in each group, which is easier when the index is reset):
In [15]: df.reset_index().groupby('X')['Y'].nunique()
Out[15]:
X
bar 1
baz 3
foo 1
qux 2
Name: Y, dtype: int64
Upvotes: 31