Reputation: 35
I have to groupby the below data for col1, col2 and loc and count the number of items in col3. Also, to be considered is the start and end date, i.e the date should range between 01/01/2021 to 31/01/2021. The end result should be displayed in col4.
Data
Date col1 col2 loc col3
01/01/2021 India Fruits A Mango
04/01/2021 India Fruits A Apple
08/01/2021 India Fruits A Banana
09/01/2021 India Vegetables B Onion
07/01/2021 India Vegetables B Capsicum
02/02/2021 India Vegetables B Tomato
03/01/2021 Germany Fruits C Mango
19/01/2021 Germany Fruits C Apple
28/01/2021 Germany Fruits C Banana
29/01/2021 Germany Vegetables D Onion
07/02/2021 Germany Vegetables D Capsicum
02/01/2021 Germany Vegetables D Tomato
Expected output
Date col1 col2 loc col3 col4
01/01/2021 India Fruits A Mango 3
04/01/2021 India Fruits A Apple 3
08/01/2021 India Fruits A Banana 3
09/01/2021 India Vegetables B Onion 2
07/01/2021 India Vegetables B Capsicum 2
03/01/2021 Germany Fruits C Mango 3
19/01/2021 Germany Fruits C Apple 3
28/01/2021 Germany Fruits C Banana 3
29/01/2021 Germany Vegetables D Onion 2
02/01/2021 Germany Vegetables D Tomato 2
Upvotes: 0
Views: 458
Reputation: 863791
Use Series.between
for filter by datetimes and then GroupBy.transform
:
m = pd.to_datetime(df['Date'], dayfirst=True).between('2021-01-01', '2021-01-31')
df1 = df[m].copy()
df1['col4'] = df1.groupby(['col1','col2','loc'])['col3'].transform('size')
If need count without removed:
df['col4'] = (df.assign(col3 = df['col3'].where(m))
.groupby(['col1','col2','loc'])['col3']
.transform('count'))
print (df)
Upvotes: 1