Reputation: 169
This is my data frame
Date Name Attempt Count
0 2015-01-02 Adam Yes
1 2015-01-29 Adam Yes
2 2015-02-02 David No
3 2015-03-02 David Yes
4 2015-03-02 Hardik Yes
5 2015-03-02 David No
Now I need to count the number of attempts of a person month-wise. Want to count only if attempt value is 'yes' and insert the value in count column.
This is the expected output
Date Name Attempt Count
0 2015-01-02 Adam Yes 2
1 2015-01-29 Adam Yes 2
2 2015-02-02 David No 0
3 2015-03-02 David Yes 1
4 2015-03-02 Hardik Yes 1
5 2015-03-02 David No 1
Upvotes: 1
Views: 312
Reputation: 1432
You will need to group the date and names on a month frequency and count how many attempts were "Yes". Adding a little temporary helper info as unique month:
df['year_x_month'] = df['date'].apply(lambda x: x.year * x.month)
res = df[df["Attempt"].eq("Yes")]
.groupby(['year_x_month', 'Name'])['Attempt']
.count()
res
looks something like this (with MultiIndex as ('year_x_month', 'Name')):
year_x_month Name
2015 Adam 2
6045 David 1
Hardik 1
Name: Attempt, dtype: int64
Now assign the amount of attempts per person and per month using a map for the unique combinations of year/month and person name:
my_map = dict(zip(res.index, res.values))
df['Counts'] = pd.Series(zip(df['year_x_month'], df['Name'])).map(my_map))
del df['year_x_month']
Output:
date Name Attempt Counts
0 2015-01-02 Adam Yes 2.0
1 2015-01-29 Adam Yes 2.0
2 2015-02-02 David No NaN
3 2015-03-02 David Yes 1.0
4 2015-03-02 Hardik Yes 1.0
5 2015-03-02 David No 1.0
NOTE: removed previously flawed implementation
After seeing the nice answer by @ThePyGuy. I was interested on checking performance of both his and my suggestions (none "correcter" than the other I hope ;)) and found out his suggestion to take 4.53 ms and mine 3.97 ms. Note that these statistics are for this small data set. It might be performance will scale weirdly with dataframe size.
Upvotes: 1
Reputation: 18466
Convert Date
column to datetime type if not already, then assign a temporary column as month extracting the month value from the Date
column.
Then filter out rows with Yes
value for Attempt
column, and group it by month
, and Name
columns, then call size
to get count, finally call to_frame
passing count
as column name to create a dataframe out of it, and merge it with the dataframe with left
merge passing the common keys for on
parameter, then fillna(0)
, drop month column, and convert count column as integer type.
df['Date'] = pd.to_datetime(df['Date'])
df['month']=df['Date'].dt.month
(df.merge(df[df['Attempt'].eq('Yes')]
.groupby(['month', 'Name'])
.size()
.to_frame('count'),
how='left',
on=['month', 'Name']
).fillna(0).drop(columns='month').astype({'count':int})
)
Date Name Attempt count
0 2015-01-02 Adam Yes 2
1 2015-01-29 Adam Yes 2
2 2015-02-02 David No 0
3 2015-03-02 David Yes 1
4 2015-03-02 Hardik Yes 1
5 2015-03-02 David No 1
Upvotes: 3