Reputation: 117
I am using a dataframe setup as follows:
| date | Product | Region | Age |
| 01/12/20 | Sales | NW | 17 |
| 01/12/20 | Sales | NW | 30 |
| 01/11/20 | Sales | SW | 32 |
| 01/12/20 | Import | SW | 5 |
| 01/12/20 | Import | SW | 45 |
| 01/12/20 | Sales | NW | 31 |
| 01/12/20 | Import | NW | 2 |
I would like to create a new dataframe showing the count of rows that have an age > 30, that are grouped by date/product/region. ie:
date | Product | Region | aged |
---|---|---|---|
01/12/20 | Sales | NW | 2 |
01/11/20 | Sales | SW | 1 |
01/12/20 | Import | SW | 1 |
I have tried a number of different approaches, the latest using lambda but so far unsuccessful:
df['aged'] = df.groupby([pd.Grouper(freq="M"), 'Product', 'Region'])['Product'].transform(lambda x: count( x ) if df['age'] >= 30 else 0)
Is anyone able to help and show me where I am going wrong?
Upvotes: 0
Views: 1499
Reputation: 28659
Alternatives to the answer already proposed:
The simplest, in my opinion, would be to filter before grouping, just as the others; where I deviate is using as_index=False
, and using size
for aggregation:
(df.loc[df.Age.ge(30)]
.groupby(["date", "Product", "Region"], sort = False, as_index=False)
.agg(Aged=("Age", "size"))
)
date Product Region Aged
0 01/12/20 Sales NW 2
1 01/11/20 Sales SW 1
2 01/12/20 Import SW 1
This one uses a groupby, with the boolean condition included in the groupby:
(df.groupby(["date", "Product", "Region", df.Age.ge(30)], sort = False)
.size()
.drop(False, level="Age")
.droplevel("Age")
.reset_index(name="Aged")
)
Another option is using crosstab, again with the boolean condition:
(pd.crosstab([df.date, df.Product, df.Region], df.Age.ge(30))
.iloc[:, -1]
.loc[lambda x: x != 0]
.reset_index(name='Aged'))
date Product Region Aged
0 01/11/20 Sales SW 1
1 01/12/20 Import SW 1
2 01/12/20 Sales NW 2
Upvotes: 0
Reputation: 30920
I think you need:
new_df = (df[df['Age'].ge(30)].groupby(df.columns.difference(['Age']).tolist())['Age']
.count()
.reset_index(name='aged'))
print(new_df)
Product Region date aged
0 Import SW 01/12/20 1
1 Sales NW 01/12/20 2
2 Sales SW 01/11/20 1
Upvotes: 2
Reputation: 26676
Please Try:
df[df['Age'].ge(30)].groupby(['date', 'Product', 'Region'])\
['Age'].count().to_frame('aged')
aged
date Product Region
01/11/20 Sales SW 1
01/12/20 Import SW 1
Sales NW 2
Upvotes: 1