Reputation: 505
I have the following dataframe df
:
names | status |
---|---|
John | Completed |
James | To Do |
Jill | To Do |
Robert | In Progress |
Jill | To Do |
Jill | To Do |
Marina | Completed |
Evy | Completed |
Evy | Completed |
Now I want the count of each type of status for each user. I can get it like this for all types of statuses.
df = pd.crosstab(df.names,df.status).reset_index("names")
So now the resulting df
is
status | names | Completed | In Progress | To Do |
---|---|---|---|---|
0 | James | 0 | 0 | 1 |
1 | Robert | 0 | 1 | 0 |
2 | John | 1 | 0 | 0 |
3 | Marina | 1 | 0 | 0 |
4 | Jill | 0 | 0 | 3 |
5 | Evy | 2 | 0 | 0 |
So my problem is how can I specify only a particular type of status value to be counted? For eg: I want only the values of In Progress
and Completed
and not To Do
. And how can I add a extra column to the above called as Total Statuses
, that will actually be the total number of rows for each name in the original dataframe?
Desired Dataframe:
status | names | Completed | In Progress | Total |
---|---|---|---|---|
0 | James | 0 | 0 | 1 |
1 | Robert | 0 | 1 | 1 |
2 | John | 1 | 0 | 1 |
3 | Marina | 1 | 0 | 1 |
4 | Jill | 0 | 0 | 3 |
5 | Evy | 2 | 0 | 2 |
Upvotes: 1
Views: 381
Reputation: 24304
Another way:
pass margins
and margins_name
parameters in pd.crosstab()
:
df=(pd.crosstab(df.names,df.status,margins=True,margins_name='Total').iloc[:-1]
.reset_index().drop('To Do',1))
OR
via crosstab()
+assign()
df=(pd.crosstab(df.names,df.status).assign(Total=lambda x:x.sum(1))
.reset_index().drop('To Do',1))
OR
In 2 steps:
df=pd.crosstab(df.names,df.status)
df=df.assign(Total=df.sum(1)).drop('To Do',1).reset_index()
Upvotes: 1
Reputation: 1280
df = pd.DataFrame({'names': ['John', 'James', 'Jill', 'Robert', 'Jill', 'Jill', 'Marina', 'Evy', 'Evy'],
'status':['Completed', 'To Do', 'To Do', 'In Progress', 'To Do', 'To Do', 'Completed', 'Completed', 'Completed']})
df = pd.crosstab(df.names,df.status).reset_index("names")
df['Total'] = df['Completed'] + df['In Progress'] + df['To Do']
df = df.drop(columns=['To Do'])
print(df)
Output:
status names Completed In Progress Total
0 Evy 2 0 2
1 James 0 0 1
2 Jill 0 0 3
3 John 1 0 1
4 Marina 1 0 1
5 Robert 0 1 1
I can't comprehend what kind of sorting system you are using. But I think you will manage to do that yourself.
Upvotes: 0
Reputation: 304
You can create the total from the addition of the three previous columns:
df['Total'] = (df['Completed'] + df['In Progress'] + df['To Do'])
Then you can drop the 'to-do' from your new data frame as follows :
df = df.drop(columns=['To Do'])
Upvotes: 0