Naive Babes
Naive Babes

Reputation: 375

Python - Sort dataframe through all days of the week

I've a dataframe with columns Description, Day and Count.

I applied

df.sort_values(['Count'], ascending = False)

on it to yield the following output:

     Count        Day                       Error Description
261   4846   Thursday          N25846 External EMERGENCY STOP
263   3993  Wednesday          N25846 External EMERGENCY STOP
257   3303     Friday          N25846 External EMERGENCY STOP
504   3227  Wednesday                        N63 Handwheel? C
795   2954   Thursday       P873 ENCLOSURE DOOR CAN BE OPENED
797   2778  Wednesday       P873 ENCLOSURE DOOR CAN BE OPENED
791   2644     Friday       P873 ENCLOSURE DOOR CAN BE OPENED
796   2633    Tuesday       P873 ENCLOSURE DOOR CAN BE OPENED
262   2480    Tuesday          N25846 External EMERGENCY STOP
501   2157     Monday                        N63 Handwheel? C
601   2130   Thursday                P124 Magazine is running
597   2130     Friday                P124 Magazine is running
793   2047   Saturday       P873 ENCLOSURE DOOR CAN BE OPENED
503   1983    Tuesday                        N63 Handwheel? C
599   1961   Saturday                P124 Magazine is running
602   1921    Tuesday                P124 Magazine is running
792   1900     Monday       P873 ENCLOSURE DOOR CAN BE OPENED
603   1865  Wednesday                P124 Magazine is running
502   1705   Saturday                        N63 Handwheel? C

I was wondering if there was a way to sort the dataframe to display distribution of a top errors through all days of the week. The expected output is: (assuming that N25846 External EMERGENCY STOP is the top error followed by N63 Handwheel? C and so on..)

     Count        Day                       Error Description
261   4846   Thursday          N25846 External EMERGENCY STOP
263   3993  Wednesday          N25846 External EMERGENCY STOP
257   3303     Friday          N25846 External EMERGENCY STOP
262   2480    Tuesday          N25846 External EMERGENCY STOP
504   3227  Wednesday                        N63 Handwheel? C
501   2157     Monday                        N63 Handwheel? C
503   1983    Tuesday                        N63 Handwheel? C
502   1705   Saturday                        N63 Handwheel? C
795   2954   Thursday       P873 ENCLOSURE DOOR CAN BE OPENED
797   2778  Wednesday       P873 ENCLOSURE DOOR CAN BE OPENED
791   2644     Friday       P873 ENCLOSURE DOOR CAN BE OPENED
796   2633    Tuesday       P873 ENCLOSURE DOOR CAN BE OPENED
793   2047   Saturday       P873 ENCLOSURE DOOR CAN BE OPENED
792   1900     Monday       P873 ENCLOSURE DOOR CAN BE OPENED
601   2130   Thursday                P124 Magazine is running
597   2130     Friday                P124 Magazine is running
599   1961   Saturday                P124 Magazine is running
602   1921    Tuesday                P124 Magazine is running
603   1865  Wednesday                P124 Magazine is running

Upvotes: 1

Views: 162

Answers (2)

Parfait
Parfait

Reputation: 107687

Similar to @Jakevdp, consider creating a new column of the maximum count in each error description, using a groupby.apply() function. Then, sort by it and the error.

def maxcount(row):
    row['MaxCount'] = row['Count'].max()
    return row

df = df.groupby(['Error Description']).apply(maxcount) \
               .sort(['MaxCount', 'Error Description'], ascending=[0,0])

OUTPUT

Count        Day                  Error Description  MaxCount
 4846   Thursday     N25846 External EMERGENCY STOP      4846
 3993  Wednesday     N25846 External EMERGENCY STOP      4846
 3303     Friday     N25846 External EMERGENCY STOP      4846
 2480    Tuesday     N25846 External EMERGENCY STOP      4846
 3227  Wednesday                   N63 Handwheel? C      3227
 2157     Monday                   N63 Handwheel? C      3227
 1983    Tuesday                   N63 Handwheel? C      3227
 1705   Saturday                   N63 Handwheel? C      3227
 2954   Thursday  P873 ENCLOSURE DOOR CAN BE OPENED      2954
 2778  Wednesday  P873 ENCLOSURE DOOR CAN BE OPENED      2954
 2644     Friday  P873 ENCLOSURE DOOR CAN BE OPENED      2954
 2633    Tuesday  P873 ENCLOSURE DOOR CAN BE OPENED      2954
 2047   Saturday  P873 ENCLOSURE DOOR CAN BE OPENED      2954
 1900     Monday  P873 ENCLOSURE DOOR CAN BE OPENED      2954
 2130   Thursday           P124 Magazine is running      2130
 2130     Friday           P124 Magazine is running      2130
 1961   Saturday           P124 Magazine is running      2130
 1921    Tuesday           P124 Magazine is running      2130
 1865  Wednesday           P124 Magazine is running      2130

Upvotes: 1

jakevdp
jakevdp

Reputation: 86433

You can do this with a group-by, followed by a join, followed by a sort. For example:

totals = df.groupby('Error Description').sum()

joined = df.join(totals, on='Error Description', rsuffix='_total')

result = joined.sort_values(['Count_total', 'Count'], ascending=False)

The output is

>>> print(result)

     Count        Day                  Error Description  Count_total
795   2954   Thursday  P873 ENCLOSURE DOOR CAN BE OPENED        14956
797   2778  Wednesday  P873 ENCLOSURE DOOR CAN BE OPENED        14956
791   2644     Friday  P873 ENCLOSURE DOOR CAN BE OPENED        14956
796   2633    Tuesday  P873 ENCLOSURE DOOR CAN BE OPENED        14956
793   2047   Saturday  P873 ENCLOSURE DOOR CAN BE OPENED        14956
792   1900     Monday  P873 ENCLOSURE DOOR CAN BE OPENED        14956
261   4846   Thursday     N25846 External EMERGENCY STOP        14622
263   3993  Wednesday     N25846 External EMERGENCY STOP        14622
257   3303     Friday     N25846 External EMERGENCY STOP        14622
262   2480    Tuesday     N25846 External EMERGENCY STOP        14622
601   2130   Thursday           P124 Magazine is running        10007
597   2130     Friday           P124 Magazine is running        10007
599   1961   Saturday           P124 Magazine is running        10007
602   1921    Tuesday           P124 Magazine is running        10007
603   1865  Wednesday           P124 Magazine is running        10007
504   3227  Wednesday                   N63 Handwheel? C         9072
501   2157     Monday                   N63 Handwheel? C         9072
503   1983    Tuesday                   N63 Handwheel? C         9072
502   1705   Saturday                   N63 Handwheel? C         9072

Note that this doesn't match your example output, but it is the correct ordering by total error count. Looking at your example output, perhaps what you want is to replace sum() above with max() — it's not entirely clear to me from your question.

Upvotes: 1

Related Questions