Reputation: 375
I've a dataframe with columns Description
, Day
and Count
.
I applied
df.sort_values(['Count'], ascending = False)
on it to yield the following output:
Count Day Error Description
261 4846 Thursday N25846 External EMERGENCY STOP
263 3993 Wednesday N25846 External EMERGENCY STOP
257 3303 Friday N25846 External EMERGENCY STOP
504 3227 Wednesday N63 Handwheel? C
795 2954 Thursday P873 ENCLOSURE DOOR CAN BE OPENED
797 2778 Wednesday P873 ENCLOSURE DOOR CAN BE OPENED
791 2644 Friday P873 ENCLOSURE DOOR CAN BE OPENED
796 2633 Tuesday P873 ENCLOSURE DOOR CAN BE OPENED
262 2480 Tuesday N25846 External EMERGENCY STOP
501 2157 Monday N63 Handwheel? C
601 2130 Thursday P124 Magazine is running
597 2130 Friday P124 Magazine is running
793 2047 Saturday P873 ENCLOSURE DOOR CAN BE OPENED
503 1983 Tuesday N63 Handwheel? C
599 1961 Saturday P124 Magazine is running
602 1921 Tuesday P124 Magazine is running
792 1900 Monday P873 ENCLOSURE DOOR CAN BE OPENED
603 1865 Wednesday P124 Magazine is running
502 1705 Saturday N63 Handwheel? C
I was wondering if there was a way to sort the dataframe to display distribution of a top errors through all days of the week.
The expected output is: (assuming that N25846 External EMERGENCY STOP
is the top error followed by N63 Handwheel? C
and so on..)
Count Day Error Description
261 4846 Thursday N25846 External EMERGENCY STOP
263 3993 Wednesday N25846 External EMERGENCY STOP
257 3303 Friday N25846 External EMERGENCY STOP
262 2480 Tuesday N25846 External EMERGENCY STOP
504 3227 Wednesday N63 Handwheel? C
501 2157 Monday N63 Handwheel? C
503 1983 Tuesday N63 Handwheel? C
502 1705 Saturday N63 Handwheel? C
795 2954 Thursday P873 ENCLOSURE DOOR CAN BE OPENED
797 2778 Wednesday P873 ENCLOSURE DOOR CAN BE OPENED
791 2644 Friday P873 ENCLOSURE DOOR CAN BE OPENED
796 2633 Tuesday P873 ENCLOSURE DOOR CAN BE OPENED
793 2047 Saturday P873 ENCLOSURE DOOR CAN BE OPENED
792 1900 Monday P873 ENCLOSURE DOOR CAN BE OPENED
601 2130 Thursday P124 Magazine is running
597 2130 Friday P124 Magazine is running
599 1961 Saturday P124 Magazine is running
602 1921 Tuesday P124 Magazine is running
603 1865 Wednesday P124 Magazine is running
Upvotes: 1
Views: 162
Reputation: 107687
Similar to @Jakevdp, consider creating a new column of the maximum count in each error description, using a groupby.apply() function. Then, sort by it and the error.
def maxcount(row):
row['MaxCount'] = row['Count'].max()
return row
df = df.groupby(['Error Description']).apply(maxcount) \
.sort(['MaxCount', 'Error Description'], ascending=[0,0])
OUTPUT
Count Day Error Description MaxCount
4846 Thursday N25846 External EMERGENCY STOP 4846
3993 Wednesday N25846 External EMERGENCY STOP 4846
3303 Friday N25846 External EMERGENCY STOP 4846
2480 Tuesday N25846 External EMERGENCY STOP 4846
3227 Wednesday N63 Handwheel? C 3227
2157 Monday N63 Handwheel? C 3227
1983 Tuesday N63 Handwheel? C 3227
1705 Saturday N63 Handwheel? C 3227
2954 Thursday P873 ENCLOSURE DOOR CAN BE OPENED 2954
2778 Wednesday P873 ENCLOSURE DOOR CAN BE OPENED 2954
2644 Friday P873 ENCLOSURE DOOR CAN BE OPENED 2954
2633 Tuesday P873 ENCLOSURE DOOR CAN BE OPENED 2954
2047 Saturday P873 ENCLOSURE DOOR CAN BE OPENED 2954
1900 Monday P873 ENCLOSURE DOOR CAN BE OPENED 2954
2130 Thursday P124 Magazine is running 2130
2130 Friday P124 Magazine is running 2130
1961 Saturday P124 Magazine is running 2130
1921 Tuesday P124 Magazine is running 2130
1865 Wednesday P124 Magazine is running 2130
Upvotes: 1
Reputation: 86433
You can do this with a group-by, followed by a join, followed by a sort. For example:
totals = df.groupby('Error Description').sum()
joined = df.join(totals, on='Error Description', rsuffix='_total')
result = joined.sort_values(['Count_total', 'Count'], ascending=False)
The output is
>>> print(result)
Count Day Error Description Count_total
795 2954 Thursday P873 ENCLOSURE DOOR CAN BE OPENED 14956
797 2778 Wednesday P873 ENCLOSURE DOOR CAN BE OPENED 14956
791 2644 Friday P873 ENCLOSURE DOOR CAN BE OPENED 14956
796 2633 Tuesday P873 ENCLOSURE DOOR CAN BE OPENED 14956
793 2047 Saturday P873 ENCLOSURE DOOR CAN BE OPENED 14956
792 1900 Monday P873 ENCLOSURE DOOR CAN BE OPENED 14956
261 4846 Thursday N25846 External EMERGENCY STOP 14622
263 3993 Wednesday N25846 External EMERGENCY STOP 14622
257 3303 Friday N25846 External EMERGENCY STOP 14622
262 2480 Tuesday N25846 External EMERGENCY STOP 14622
601 2130 Thursday P124 Magazine is running 10007
597 2130 Friday P124 Magazine is running 10007
599 1961 Saturday P124 Magazine is running 10007
602 1921 Tuesday P124 Magazine is running 10007
603 1865 Wednesday P124 Magazine is running 10007
504 3227 Wednesday N63 Handwheel? C 9072
501 2157 Monday N63 Handwheel? C 9072
503 1983 Tuesday N63 Handwheel? C 9072
502 1705 Saturday N63 Handwheel? C 9072
Note that this doesn't match your example output, but it is the correct ordering by total error count. Looking at your example output, perhaps what you want is to replace sum()
above with max()
— it's not entirely clear to me from your question.
Upvotes: 1