Reputation: 934
Let's say I create the following dataframe with a df.set_index('Class','subclass'), bearing in mind there are multiple Classes with subclasses... A>Z.
Class subclass
A a
A b
A c
A d
B a
B b
How would I count the subclasses in the Class and create a separate column named no of classes such that I can see the Class with the greatest number of subclasses? I was thinking some sort of for loop which runs through the Class letters and counts the subclass if that Class letter is still the same. However, this seems a bit counterintuitive for such a problem. Would there be a more simple approach such as a df.groupby[].count?
The desired output would be:
Class subclass No. of classes
A a 4
A b
A c
A d
B a 2
B b
I have tried the level parameter as shown in group multi-index pandas dataframe but this doesn't seem to work for me
EDIT:
I did not mention that I wanted a return of the Class with the greatest number of subclasses. I achieved this with:
df.reset_index().groupby('Class')['subclass'].nunique().idxmax()
Upvotes: 4
Views: 874
Reputation: 862511
You can use transform
, but get duplicates values:
df['No. of classes'] = df.groupby(level='Class')['val'].transform('size')
print (df)
val No. of classes
Class subclass
A a 1 4
b 4 4
c 5 4
d 4 4
B a 1 2
b 2 2
But if need empty values:
df['No. of classes'] = df.groupby(level='Class')
.apply(lambda x: pd.Series( [len(x)] + [np.nan] * (len(x)-1)))
.values
print (df)
val No. of classes
Class subclass
A a 1 4.0
b 4 NaN
c 5 NaN
d 4 NaN
B a 1 2.0
b 2 NaN
Another solution for get Class
with greatest number is:
df = df.groupby(level=['Class'])
.apply(lambda x: x.index.get_level_values('subclass').nunique())
.idxmax()
print (df)
A
Upvotes: 3
Reputation: 393963
You can use transform
to add an aggregated calculation back to the original df as a new column:
In [165]:
df['No. of classes'] = df.groupby('Class')['subclass'].transform('count')
df
Out[165]:
Class subclass No. of classes
0 A a 4
1 A b 4
2 A c 4
3 A d 4
4 B a 2
5 B b 2
Upvotes: 2