Reputation: 3385
I want to make a bar chart for the number of occurrences for a list. More specifically, I start with a list like:
>>> print(some_list)
[2, 3, 10, 5, 20, 34, 50, 10, 10 ... ]
This list is basically integers within the range of [0, 2470]. What I want to do is plot the number of occurrences for each integer. The code that I wrote is:
from collections import Counter
import matplotlib.pyplot as plt
import pandas as pd
sorted_list = sorted(some_list)
sorted_counted = Counter(sorted_list)
range_length = list(range(max(some_list))) # Get the largest value to get the range.
data_series = {}
for i in range_length:
data_series[i] = 0 # Initialize series so that we have a template and we just have to fill in the values.
for key, value in sorted_counted.items():
data_series[key] = value
data_series = pd.Series(data_series)
x_values = data_series.shape[0]
plt.bar(x_values, data_series.values)
plt.show()
When I run this code I get the following plot:
which isn't what I'm looking for.
The plot that I'm expecting has the $x$ values are the values in [0, 2740] and the $y$ values should be the number of occurrences for each integer value. It should look like a reversed exponential graph.
What's the problem with my code? Thanks in advance.
Upvotes: 0
Views: 4147
Reputation: 19545
The line x_values = data_series.shape[0]
is causing problems: this turns your x_values into the first dimension of data_series (a single value), which is not what you want. Try x_values = data_series.index
instead which will give you a list of the all the integers up to the highest one occurring.
To show it's generalizable, here's what I got using a Poisson distribution.
from collections import Counter
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
some_list = np.random.poisson(500, 2470).tolist()
sorted_list = sorted(some_list)
sorted_counted = Counter(sorted_list)
range_length = list(range(max(some_list))) # Get the largest value to get the range.
data_series = {}
for i in range_length:
data_series[i] = 0 # Initialize series so that we have a template and we just have to fill in the values.
for key, value in sorted_counted.items():
data_series[key] = value
data_series = pd.Series(data_series)
x_values = data_series.index
# you can customize the limits of the x-axis
# plt.xlim(0, max(some_list))
plt.bar(x_values, data_series.values)
plt.show()
Upvotes: 2