user001
user001

Reputation: 505

Get count of particular values and the total based on another column value in dataframe using Pandas

I have the following dataframe df:

names status
John Completed
James To Do
Jill To Do
Robert In Progress
Jill To Do
Jill To Do
Marina Completed
Evy Completed
Evy Completed

Now I want the count of each type of status for each user. I can get it like this for all types of statuses.

df = pd.crosstab(df.names,df.status).reset_index("names")

So now the resulting df is

status names Completed In Progress To Do
0 James 0 0 1
1 Robert 0 1 0
2 John 1 0 0
3 Marina 1 0 0
4 Jill 0 0 3
5 Evy 2 0 0

So my problem is how can I specify only a particular type of status value to be counted? For eg: I want only the values of In Progress and Completed and not To Do. And how can I add a extra column to the above called as Total Statuses, that will actually be the total number of rows for each name in the original dataframe?

Desired Dataframe:

status names Completed In Progress Total
0 James 0 0 1
1 Robert 0 1 1
2 John 1 0 1
3 Marina 1 0 1
4 Jill 0 0 3
5 Evy 2 0 2

Upvotes: 1

Views: 381

Answers (3)

Anurag Dabas
Anurag Dabas

Reputation: 24304

Another way:

pass margins and margins_name parameters in pd.crosstab():

df=(pd.crosstab(df.names,df.status,margins=True,margins_name='Total').iloc[:-1]
      .reset_index().drop('To Do',1))

OR

via crosstab()+assign()

df=(pd.crosstab(df.names,df.status).assign(Total=lambda x:x.sum(1))
      .reset_index().drop('To Do',1))

OR

In 2 steps:

df=pd.crosstab(df.names,df.status)
df=df.assign(Total=df.sum(1)).drop('To Do',1).reset_index()

Upvotes: 1

Karina
Karina

Reputation: 1280

df = pd.DataFrame({'names': ['John', 'James', 'Jill', 'Robert', 'Jill', 'Jill', 'Marina', 'Evy', 'Evy'],
                   'status':['Completed', 'To Do', 'To Do', 'In Progress', 'To Do', 'To Do', 'Completed', 'Completed', 'Completed']})
df = pd.crosstab(df.names,df.status).reset_index("names")
df['Total'] = df['Completed'] + df['In Progress'] + df['To Do']
df = df.drop(columns=['To Do'])
print(df)

Output:

status   names  Completed  In Progress  Total
0          Evy          2            0      2
1        James          0            0      1
2         Jill          0            0      3
3         John          1            0      1
4       Marina          1            0      1
5       Robert          0            1      1

I can't comprehend what kind of sorting system you are using. But I think you will manage to do that yourself.

Upvotes: 0

coyote
coyote

Reputation: 304

You can create the total from the addition of the three previous columns:

df['Total'] = (df['Completed'] + df['In Progress'] + df['To Do'])

Then you can drop the 'to-do' from your new data frame as follows :

df = df.drop(columns=['To Do'])

Upvotes: 0

Related Questions