Nithya Babu
Nithya Babu

Reputation: 35

Groupby and count in Pandas

I have to groupby the below data for col1, col2 and loc and count the number of items in col3. Also, to be considered is the start and end date, i.e the date should range between 01/01/2021 to 31/01/2021. The end result should be displayed in col4.

Data

Date        col1        col2        loc     col3    
01/01/2021  India       Fruits        A     Mango
04/01/2021  India       Fruits        A     Apple
08/01/2021  India       Fruits        A     Banana  
09/01/2021  India       Vegetables    B     Onion
07/01/2021  India       Vegetables    B     Capsicum
02/02/2021  India       Vegetables    B     Tomato
03/01/2021  Germany     Fruits        C     Mango
19/01/2021  Germany     Fruits        C     Apple
28/01/2021  Germany     Fruits        C     Banana  
29/01/2021  Germany     Vegetables    D     Onion
07/02/2021  Germany     Vegetables    D     Capsicum
02/01/2021  Germany     Vegetables    D     Tomato

Expected output

Date        col1        col2        loc      col3          col4
01/01/2021  India       Fruits      A        Mango          3
04/01/2021  India       Fruits      A        Apple          3
08/01/2021  India       Fruits      A        Banana         3
09/01/2021  India       Vegetables  B        Onion          2
07/01/2021  India       Vegetables  B        Capsicum       2
03/01/2021  Germany     Fruits      C        Mango          3
19/01/2021  Germany     Fruits      C        Apple          3
28/01/2021  Germany     Fruits      C        Banana         3
29/01/2021  Germany     Vegetables  D        Onion          2
02/01/2021  Germany     Vegetables  D        Tomato         2

Upvotes: 0

Views: 458

Answers (1)

jezrael
jezrael

Reputation: 863791

Use Series.between for filter by datetimes and then GroupBy.transform:

m = pd.to_datetime(df['Date'], dayfirst=True).between('2021-01-01', '2021-01-31')

df1 = df[m].copy()
df1['col4'] = df1.groupby(['col1','col2','loc'])['col3'].transform('size')

If need count without removed:

df['col4'] = (df.assign(col3 = df['col3'].where(m))
                .groupby(['col1','col2','loc'])['col3']
                .transform('count'))
print (df)

Upvotes: 1

Related Questions