Sean
Sean

Reputation: 3385

Matplotlib bar chart for number of occurrences

I want to make a bar chart for the number of occurrences for a list. More specifically, I start with a list like:

>>> print(some_list)
[2, 3, 10, 5, 20, 34, 50, 10, 10 ... ]

This list is basically integers within the range of [0, 2470]. What I want to do is plot the number of occurrences for each integer. The code that I wrote is:

from collections import Counter

import matplotlib.pyplot as plt
import pandas as pd


sorted_list = sorted(some_list)
sorted_counted = Counter(sorted_list)

range_length = list(range(max(some_list))) # Get the largest value to get the range.
data_series = {}

for i in range_length:
    data_series[i] = 0 # Initialize series so that we have a template and we just have to fill in the values.

for key, value in sorted_counted.items():
    data_series[key] = value

data_series = pd.Series(data_series)
x_values = data_series.shape[0]

plt.bar(x_values, data_series.values)
plt.show()

When I run this code I get the following plot:

enter image description here

which isn't what I'm looking for.

The plot that I'm expecting has the $x$ values are the values in [0, 2740] and the $y$ values should be the number of occurrences for each integer value. It should look like a reversed exponential graph.

What's the problem with my code? Thanks in advance.

Upvotes: 0

Views: 4147

Answers (1)

Derek O
Derek O

Reputation: 19545

The line x_values = data_series.shape[0] is causing problems: this turns your x_values into the first dimension of data_series (a single value), which is not what you want. Try x_values = data_series.index instead which will give you a list of the all the integers up to the highest one occurring.

To show it's generalizable, here's what I got using a Poisson distribution.

from collections import Counter

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

some_list = np.random.poisson(500, 2470).tolist()

sorted_list = sorted(some_list)
sorted_counted = Counter(sorted_list)

range_length = list(range(max(some_list))) # Get the largest value to get the range.
data_series = {}

for i in range_length:
    data_series[i] = 0 # Initialize series so that we have a template and we just have to fill in the values.

for key, value in sorted_counted.items():
    data_series[key] = value

data_series = pd.Series(data_series)
x_values = data_series.index

# you can customize the limits of the x-axis
# plt.xlim(0, max(some_list))
plt.bar(x_values, data_series.values)

plt.show() 

enter image description here

Upvotes: 2

Related Questions