Willem van der Spek
Willem van der Spek

Reputation: 125

Sort bar chart by list values in matplotlib

I am encountering an issue regarding the sorting my features by their value. I would like to see my image with bars getting shorter based on how high they are on the y-axis. Unfortunately, my barplot looks like this, with the features being sorted alphabetically:

enter image description here

Right now I am running the following code:

unsorted_list = [(importance, feature) for feature, importance in 
                  zip(features, importances)]
sorted_list = sorted(unsorted_list)

features_sorted = []
importance_sorted = []

for i in sorted_list:
    features_sorted += [i[1]]
    importance_sorted += [i[0]]
 
plt.title("Feature importance", fontsize=15)
plt.xlabel("Importance", fontsize=13)

plt.barh(features_sorted,importance_sorted, color="green", edgecolor='green')

# plt.savefig('importance_barh.png', dpi=100)

Here is the data going through there:

unsorted_list =  
 [('HR', 0.28804817462980353),
 ('BR', 0.04062328177704225),
 ('Posture', 0.09011618483921582),
 ('Activity', 0.0017821837085763366),
 ('PeakAccel', 0.002649111136700579),
 ('HRV', 0.13598729040097057),
 ('ROGState', 0.014534726412631642),
 ('ROGTime', 0.22986192060475388),
 ('VerticalMin', 0.016099772399198357),
 ('VerticalPeak', 0.012697214182994502),
 ('LateralMin', 0.029479112475744584),
 ('LateralPeak', 0.022745210003295983),
 ('SagittalMin', 0.08653071485979484),
 ('SagittalPeak', 0.028845102569277088)]

sorted_list = 
[(0.0017821837085763366, 'Activity'),
 (0.002649111136700579, 'PeakAccel'),
 (0.012697214182994502, 'VerticalPeak'),
 (0.014534726412631642, 'ROGState'),
 (0.016099772399198357, 'VerticalMin'),
 (0.022745210003295983, 'LateralPeak'),
 (0.028845102569277088, 'SagittalPeak'),
 (0.029479112475744584, 'LateralMin'),
 (0.04062328177704225, 'BR'),
 (0.08653071485979484, 'SagittalMin'),
 (0.09011618483921582, 'Posture'),
 (0.13598729040097057, 'HRV'),
 (0.22986192060475388, 'ROGTime'),
 (0.28804817462980353, 'HR')]

I recently upgraded to matplotlib 3.0.2

Upvotes: 3

Views: 19945

Answers (3)

smv7
smv7

Reputation: 11

Searching an answer to the same problem, I came here, but how no answer satisfied me I create this simpler approach to sort any 2D structure like your list of tuples or a dict_items object to sort a dictionary:

# Sorting a list of tuples by index 0 or 1.
unsorted_list: list[tuple[str,int]] = [('first', 1), ('third', 3), ('second', 2)]
sorted_list_by_index_0 = sorted(unsorted_list, key=lambda x: x[0])
sorted_list_by_index_1 = sorted(unsorted_list, key=lambda x: x[1])

# Sorting a dictionary by keys or values.
unsorted_dict: dict[str,int]] = {'first': 1, 'third': 3, 'second': 2}
sorted_dict_by_keys = sorted(unsorted_dict.items(), lambda x: x[0])
sorted_dict_by_values = sorted(unsorted_dict.items(), lambda x: x[1])

This approach allowed me to solve my problem in an attempt to display a matplotlib.pyplot.barh plot (an horizontal bar plot) with ordered bars using a dictionary containing frequencies of words. Happy coding!

Upvotes: 1

Sheldore
Sheldore

Reputation: 39052

EDIT (based on the comments)

Your code works fine on matplotlib 2.2.2 and the issue seems to be with your list naming convention and some confusion among them. It will work as expected on 3.0.2. Nevertheless, you might be interested in knowing the workaround

features_sorted = []
importance_sorted = []

for i in sorted_list:
    features_sorted += [i[1]]
    importance_sorted += [i[0]]

plt.title("Feature importance", fontsize=15)
plt.xlabel("Importance", fontsize=13)

plt.barh(range(len(importance_sorted)), importance_sorted, color="green", edgecolor='green')
plt.yticks(range(len(importance_sorted)), features_sorted);

enter image description here

Alternative suggested by @tmdavison

plt.barh(range(len(importance_sorted)), importance_sorted, color="green", 
     edgecolor='green', tick_label=features_sorted)

Upvotes: 7

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339220

To avoid confusion from the other answer here, note that the code in the question runs fine and gives the desired output for any version of matplotlib >= 2.2.

import matplotlib
print(matplotlib.__version__)
import matplotlib.pyplot as plt


sorted_list = [(0.0017821837085763366, 'Activity'),
 (0.002649111136700579, 'PeakAccel'),
 (0.012697214182994502, 'VerticalPeak'),
 (0.014534726412631642, 'ROGState'),
 (0.016099772399198357, 'VerticalMin'),
 (0.022745210003295983, 'LateralPeak'),
 (0.028845102569277088, 'SagittalPeak'),
 (0.029479112475744584, 'LateralMin'),
 (0.04062328177704225, 'BR'),
 (0.08653071485979484, 'SagittalMin'),
 (0.09011618483921582, 'Posture'),
 (0.13598729040097057, 'HRV'),
 (0.22986192060475388, 'ROGTime'),
 (0.28804817462980353, 'HR')]

features_sorted = []
importance_sorted = []

for i in sorted_list:
    features_sorted += [i[1]]
    importance_sorted += [i[0]]

plt.title("Feature importance", fontsize=15)
plt.xlabel("Importance", fontsize=13)

plt.barh(features_sorted, importance_sorted, color="green", edgecolor='green')
plt.show()

enter image description here

The issue OP reports about is most probably caused by naming distinct lists by the same name and not restarting the kernel in between or similar non-reproducible things.

Upvotes: 0

Related Questions