Reputation: 404
I have a data frame. I have grouped a column status by date using
y = news_dataframe.groupby(by=[news_dataframe['date'].dt.date,news_dataframe['status']])['status'].count()
and my output is --
date status count
2019-05-29 selected 24
rejected auto 243
waiting 109
no action 1363
2019-05-30 selected 28
rejected auto 188
waiting 132
no action 1249
repeat 3
2019-05-31 selected 13
rejected auto 8
waiting 23
no action 137
repeat 2
source 1
Name: reasonForReject, dtype: int64
Now I want to calculate the percentage of each status group by date. How can I achieve this using pandas dataframe?
Upvotes: 2
Views: 478
Reputation: 2189
try this:
# just fill the consecutive rows with this
df=df.ffill()
df.df1.columns=['date','status','count']
# getting the total value of count with date and status
df1=df.groupby(['date']).sum().reset_index()
#renaming it to total as it is the sum
df1.columns=['date','status','total']
# now join the tables to find the total and actual value together
df2=df.merge(df1,on=['date'])
#calculate the percentage
df2['percentage']=(df2.count/df2.total)*100
If you need one liner its:
df['percentage']=(df.ffill()['count]/df.ffill().groupby(['date']).sum().reset_index().rename(columns={'count': 'total'}).merge(df,on=['date'])['total'])*100
Upvotes: 0
Reputation: 639
I guess that's the shortest:
news_dataframe['date'] = news_dataframe['date'].dt.date
news_dataframe.groupby(['date','status'])['status'].count()/news_dataframe.groupby(['date'])['status'].count()
Upvotes: 1
Reputation: 885
Compute two different groupbys and divide one by the other:
y_numerator = news_dataframe.groupby(by=[news_dataframe['date'].dt.date,news_dataframe['status']])['status'].count()
y_denominator = news_dataframe.groupby(by=news_dataframe['date'].dt.date)['status'].count()
y=y_numerator/y_denominator
Upvotes: 1