Reputation: 1293
I have a pandas.core.groupby.DataFrameGroupBy
object where I am trying to count the number of rows where a value for TOTAL_FLOOR_AREA
is > 30
. I can count the number of rows for each dataframe in the groupby object using:
import numpy as np
grouped = master_lsoa.groupby('lsoa11')
grouped.aggregate(np.count_nonzero).TOTAL_FLOOR_AREA
But how do I conditionally count rows where the value for TOTAL_FLOOR_AREA
is greater than 30?
Sam
Upvotes: 4
Views: 4108
Reputation: 5437
you could also construct a new column indicating where the condition is met and sum up like (stealing @jezrael's dataframe):
master_lso.assign(Large_Enough= lambda x:x["TOTAL_FLOOR_AREA"]>30)\
.groupby('lsoa11')["Large_Enough"].sum().reset_index()
Note that True
values are interpreted as 1. So the sum provides the corresponding count here.
The advantage over @jezrael's solution is that you can still sum up the total area per group
Upvotes: 1
Reputation: 862751
I think you need:
np.random.seed(6)
N = 15
master_lso = pd.DataFrame({'lsoa11': np.random.randint(4, size=N),
'TOTAL_FLOOR_AREA': np.random.choice([0,30,40,50], size=N)})
master_lso['lsoa11'] = 'a' + master_lso['lsoa11'].astype(str)
print (master_lso)
TOTAL_FLOOR_AREA lsoa11
0 40 a2
1 50 a1
2 30 a3
3 0 a0
4 40 a2
5 0 a1
6 30 a3
7 0 a2
8 40 a0
9 0 a2
10 0 a1
11 50 a1
12 50 a3
13 40 a1
14 30 a1
First filter rows by condition by boolean indexing
- it is faster before grouping, because less rows.
df = master_lso[master_lso['TOTAL_FLOOR_AREA'] > 30]
print (df)
TOTAL_FLOOR_AREA lsoa11
0 40 a2
1 50 a1
4 40 a2
8 40 a0
11 50 a1
12 50 a3
13 40 a1
Then groupby
and aggregate size
:
df1 = df.groupby('lsoa11')['TOTAL_FLOOR_AREA'].size().reset_index(name='Count')
print (df1)
lsoa11 Count
0 a0 1
1 a1 3
2 a2 2
3 a3 1
Upvotes: 3