Conditionally count values in a pandas groupby object

Question

I have a pandas.core.groupby.DataFrameGroupBy object where I am trying to count the number of rows where a value for TOTAL_FLOOR_AREA is > 30. I can count the number of rows for each dataframe in the groupby object using:

import numpy as np

grouped = master_lsoa.groupby('lsoa11')

grouped.aggregate(np.count_nonzero).TOTAL_FLOOR_AREA

But how do I conditionally count rows where the value for TOTAL_FLOOR_AREA is greater than 30?

Sam

jezrael · Accepted Answer

I think you need:

np.random.seed(6)

N = 15
master_lso = pd.DataFrame({'lsoa11': np.random.randint(4, size=N),
                           'TOTAL_FLOOR_AREA': np.random.choice([0,30,40,50], size=N)})
master_lso['lsoa11'] = 'a' + master_lso['lsoa11'].astype(str)
print (master_lso)
    TOTAL_FLOOR_AREA lsoa11
0                 40     a2
1                 50     a1
2                 30     a3
3                  0     a0
4                 40     a2
5                  0     a1
6                 30     a3
7                  0     a2
8                 40     a0
9                  0     a2
10                 0     a1
11                50     a1
12                50     a3
13                40     a1
14                30     a1

First filter rows by condition by boolean indexing - it is faster before grouping, because less rows.

df = master_lso[master_lso['TOTAL_FLOOR_AREA'] > 30]
print (df)
    TOTAL_FLOOR_AREA lsoa11
0                 40     a2
1                 50     a1
4                 40     a2
8                 40     a0
11                50     a1
12                50     a3
13                40     a1

Then groupby and aggregate size:

df1 = df.groupby('lsoa11')['TOTAL_FLOOR_AREA'].size().reset_index(name='Count')
print (df1)
  lsoa11  Count
0     a0      1
1     a1      3
2     a2      2
3     a3      1

Conditionally count values in a pandas groupby object

Answers (2)

Related Questions