LogCapy
LogCapy

Reputation: 467

Writing a custom function for groupby column

Trying to write the following custom groupby function to count the percentages of 1s in a given binary column, b:

def _get_perc(ds):
    try: 
        return ds.value_counts(normalize=True).loc[1]
    except KeyError: 
        return 0.0
df[['group','b']].groupby('group').apply(_get_perc)

But pandas is taking ds as a dataframe instead of a Series; it gives me an AttributeError: 'DataFrame' object has no attribute 'value_counts'.

How should I write the function to take ds as a Series?

Upvotes: 1

Views: 121

Answers (2)

jpp
jpp

Reputation: 164623

Just index the GroupBy object with a series label:

def _get_perc(ds):
    try: 
        return ds.value_counts(normalize=True).loc[1]
    except KeyError: 
        return 0.0

df[['group','b']].groupby('group')['b'].apply(_get_perc)

Upvotes: 1

cs95
cs95

Reputation: 402363

Specify that the grouping is to be done explicitly on column b.

df
   group  b
0      1  0
1      1  1
2      2  0
3      2  0
4      2  1

df.groupby('group')['b'].apply(_get_perc)
group
1    0.500000
2    0.333333
Name: b, dtype: float64

The pre-indexing step is not needed.


Alternatively, value_counts can also be called directly on Series:

df.groupby('group')['b'].value_counts(normalize=True).xs(1, level=1, axis=0)

group
1    0.500000
2    0.333333
Name: b, dtype: float64

The additional xs step is to select the normalised counts of 1s.

Upvotes: 3

Related Questions