Jin
Jin

Reputation: 1223

apply pandas qcut function to subgroups

Let us assume we created a dataframe df using the code below. I have created a bin frequency count based on the 'value' column in df. Now how do I get the frequency count of these label=1 samples frequency count based on previous created bin? Obviously, I should not use qcut for those label = 1 samples to get the count, since the bin positions are not same as before.

import numpy as np
import pandas as pd
mu, sigma = 0, 0.1
theta = 0.3
s = np.random.normal(mu, sigma, 100)
group = np.random.binomial(1, theta, 100)
df = pd.DataFrame(np.vstack([s,group]).transpose())
df.columns = ['value','label']
factor = pd.qcut(df['value'], 5)
factor_bin_count = pd.value_counts(factor)

Update: I took the solution from jeff

df.groupby(['label',factor]).value.count()

Upvotes: 0

Views: 648

Answers (1)

Jeff
Jeff

Reputation: 129018

If I understand your question. You want to take a grouping factor (e.g. you created using qcut to bin the continuous values), and another grouper (e.g. 'label'), then perform an operation. count in this case.

In [36]: df.groupby(['label',factor]).value.count()
Out[36]: 
label  value             
0      [-0.248, -0.0864]     14
       (-0.0864, -0.0227]    15
       (-0.0227, 0.0208]     15
       (0.0208, 0.0718]      17
       (0.0718, 0.24]        13
1      [-0.248, -0.0864]      6
       (-0.0864, -0.0227]     5
       (-0.0227, 0.0208]      5
       (0.0208, 0.0718]       3
       (0.0718, 0.24]         7
Name: value, dtype: int64

Upvotes: 1

Related Questions