Sam Comber
Sam Comber

Reputation: 1293

Conditionally count values in a pandas groupby object

I have a pandas.core.groupby.DataFrameGroupBy object where I am trying to count the number of rows where a value for TOTAL_FLOOR_AREA is > 30. I can count the number of rows for each dataframe in the groupby object using:

import numpy as np

grouped = master_lsoa.groupby('lsoa11')

grouped.aggregate(np.count_nonzero).TOTAL_FLOOR_AREA

But how do I conditionally count rows where the value for TOTAL_FLOOR_AREA is greater than 30?

Sam

Upvotes: 4

Views: 4108

Answers (2)

Quickbeam2k1
Quickbeam2k1

Reputation: 5437

you could also construct a new column indicating where the condition is met and sum up like (stealing @jezrael's dataframe):

master_lso.assign(Large_Enough= lambda x:x["TOTAL_FLOOR_AREA"]>30)\
    .groupby('lsoa11')["Large_Enough"].sum().reset_index()

Note that Truevalues are interpreted as 1. So the sum provides the corresponding count here. The advantage over @jezrael's solution is that you can still sum up the total area per group

Upvotes: 1

jezrael
jezrael

Reputation: 862751

I think you need:

np.random.seed(6)

N = 15
master_lso = pd.DataFrame({'lsoa11': np.random.randint(4, size=N),
                           'TOTAL_FLOOR_AREA': np.random.choice([0,30,40,50], size=N)})
master_lso['lsoa11'] = 'a' + master_lso['lsoa11'].astype(str)
print (master_lso)
    TOTAL_FLOOR_AREA lsoa11
0                 40     a2
1                 50     a1
2                 30     a3
3                  0     a0
4                 40     a2
5                  0     a1
6                 30     a3
7                  0     a2
8                 40     a0
9                  0     a2
10                 0     a1
11                50     a1
12                50     a3
13                40     a1
14                30     a1

First filter rows by condition by boolean indexing - it is faster before grouping, because less rows.

df = master_lso[master_lso['TOTAL_FLOOR_AREA'] > 30]
print (df)
    TOTAL_FLOOR_AREA lsoa11
0                 40     a2
1                 50     a1
4                 40     a2
8                 40     a0
11                50     a1
12                50     a3
13                40     a1

Then groupby and aggregate size:

df1 = df.groupby('lsoa11')['TOTAL_FLOOR_AREA'].size().reset_index(name='Count')
print (df1)
  lsoa11  Count
0     a0      1
1     a1      3
2     a2      2
3     a3      1

Upvotes: 3

Related Questions