Reputation: 467
Trying to write the following custom groupby function to count the percentages of 1s in a given binary column, b:
def _get_perc(ds):
try:
return ds.value_counts(normalize=True).loc[1]
except KeyError:
return 0.0
df[['group','b']].groupby('group').apply(_get_perc)
But pandas is taking ds as a dataframe instead of a Series; it gives me an AttributeError: 'DataFrame' object has no attribute 'value_counts'.
How should I write the function to take ds as a Series?
Upvotes: 1
Views: 121
Reputation: 164623
Just index the GroupBy
object with a series label:
def _get_perc(ds):
try:
return ds.value_counts(normalize=True).loc[1]
except KeyError:
return 0.0
df[['group','b']].groupby('group')['b'].apply(_get_perc)
Upvotes: 1
Reputation: 402363
Specify that the grouping is to be done explicitly on column b
.
df
group b
0 1 0
1 1 1
2 2 0
3 2 0
4 2 1
df.groupby('group')['b'].apply(_get_perc)
group
1 0.500000
2 0.333333
Name: b, dtype: float64
The pre-indexing step is not needed.
Alternatively, value_counts
can also be called directly on Series
:
df.groupby('group')['b'].value_counts(normalize=True).xs(1, level=1, axis=0)
group
1 0.500000
2 0.333333
Name: b, dtype: float64
The additional xs
step is to select the normalised counts of 1s.
Upvotes: 3