Sriram
Sriram

Reputation: 169

Count column value in a data frame

This is my data frame

           Date    Name     Attempt   Count
0  2015-01-02    Adam       Yes      
1  2015-01-29    Adam       Yes      
2  2015-02-02    David      No       
3  2015-03-02    David      Yes      
4  2015-03-02    Hardik     Yes       
5  2015-03-02    David      No    

Now I need to count the number of attempts of a person month-wise. Want to count only if attempt value is 'yes' and insert the value in count column.

This is the expected output

           Date    Name     Attempt    Count
0  2015-01-02    Adam        Yes          2
1  2015-01-29    Adam        Yes          2
2  2015-02-02    David       No           0
3  2015-03-02    David       Yes          1
4  2015-03-02    Hardik      Yes          1 
5  2015-03-02    David       No           1

Upvotes: 1

Views: 312

Answers (2)

deponovo
deponovo

Reputation: 1432

You will need to group the date and names on a month frequency and count how many attempts were "Yes". Adding a little temporary helper info as unique month:

df['year_x_month'] = df['date'].apply(lambda x: x.year * x.month)
res = df[df["Attempt"].eq("Yes")]
                      .groupby(['year_x_month', 'Name'])['Attempt']
                      .count()

res looks something like this (with MultiIndex as ('year_x_month', 'Name')):

year_x_month  Name  
2015          Adam      2
6045          David     1
              Hardik    1
Name: Attempt, dtype: int64

Now assign the amount of attempts per person and per month using a map for the unique combinations of year/month and person name:

my_map = dict(zip(res.index, res.values))
df['Counts'] = pd.Series(zip(df['year_x_month'], df['Name'])).map(my_map))
del df['year_x_month']

Output:

        date    Name Attempt  Counts
0 2015-01-02    Adam     Yes     2.0
1 2015-01-29    Adam     Yes     2.0
2 2015-02-02   David      No     NaN
3 2015-03-02   David     Yes     1.0
4 2015-03-02  Hardik     Yes     1.0
5 2015-03-02   David      No     1.0

NOTE: removed previously flawed implementation

After seeing the nice answer by @ThePyGuy. I was interested on checking performance of both his and my suggestions (none "correcter" than the other I hope ;)) and found out his suggestion to take 4.53 ms and mine 3.97 ms. Note that these statistics are for this small data set. It might be performance will scale weirdly with dataframe size.

Upvotes: 1

ThePyGuy
ThePyGuy

Reputation: 18466

Convert Date column to datetime type if not already, then assign a temporary column as month extracting the month value from the Date column.

Then filter out rows with Yes value for Attempt column, and group it by month, and Name columns, then call size to get count, finally call to_frame passing count as column name to create a dataframe out of it, and merge it with the dataframe with left merge passing the common keys for on parameter, then fillna(0), drop month column, and convert count column as integer type.

df['Date'] = pd.to_datetime(df['Date'])
df['month']=df['Date'].dt.month
(df.merge(df[df['Attempt'].eq('Yes')]
             .groupby(['month', 'Name'])
             .size()
             .to_frame('count'), 
        how='left', 
        on=['month', 'Name']
        ).fillna(0).drop(columns='month').astype({'count':int})
)
        Date    Name Attempt  count
0 2015-01-02    Adam     Yes      2
1 2015-01-29    Adam     Yes      2
2 2015-02-02   David      No      0
3 2015-03-02   David     Yes      1
4 2015-03-02  Hardik     Yes      1
5 2015-03-02   David      No      1

Upvotes: 3

Related Questions