Reputation: 11238
I have a lot different pandas.Series looks like:
my_series:
0.0 10490405.0
1.0 3334931.0
2.0 2770406.0
3.0 2286555.0
4.0 1998229.0
5.0 1636747.0
6.0 1449938.0
7.0 1180900.0
8.0 1054964.0
9.0 869783.0
10.0 773747.0
11.0 653608.0
12.0 595688.0
...
682603.0 1.0
734265.0 1.0
783295.0 1.0
868135.0 1.0
This is the frequincies of my data: this mean there are 10490405 zeros in my data, 3334931 of 1 and etc. I want to plot histogram.
I know I can do it using plt.bar
:
plt.bar(my_series.index, my_series.values)
But It works bad because of large number of unique values in my_series
(it can be thousand!). So bars at the plot too narrow and became invizible!
So I really want to use hist
to set manually number of bins and etc.
But I can't use my_series.hist() because it has not such number of zeros it has just one value for zero label!
code to reproduce the problem:
val = np.round([1000000/el**2 for el in range(1,1000)])
ind = [el*10+np.random.randint(10) for el in range(1,1000)]
my_series = pd.Series(val, ind)
plt.bar(my_series.index, my_series.values)
As I already has close vote and wrong answer I got my problem description is really bad. I want to add the example:
val1 = [100, 50, 25, 10, 10, 10]
ind1 = [0, 1, 2, 3, 4, 5]
my_series1 = pd.Series(val1, ind1)
my_series.hist()
This is just hist() on series values! So we can see, that 10 has value 3 (because there are three of them in the series) and all other has value 1 on the hist. What I want to get:
0 label has value 100, 1 label has value 50 and so on.
Upvotes: 1
Views: 1463
Reputation: 11238
I found one more unefficient solution :) but it look as I wanted, so:
func = lambda x,y: x*y
all_data = list(map(func, [[el] for el in my_series.index], [int(el) for el in my_series.values]))
merged = list(itertools.chain(*all_data))
plt.hist(merged, bins=6)
plt.show()
The Idea here is:
[[el] for el in my_series.index]
[int(el) for el in my_series.values]
list(map(func, ...))
hist()
.This is obviously unefficient, but in my task I need to calculate a lot of different parameters as mean
, std
etc. So I need to write function for all of them how to calculate. So I found faster way - just to restore data and then use builds-in.
Upvotes: 0
Reputation: 150805
You can group by index
values and plot bar:
# change bins as needed
bins = np.linspace(my_series.index[0], my_series.index[-1], 25)
my_series.groupby(pd.cut(my_series.index, bins)).sum().plot.bar()
# your data is very skewed, so log scale helps.
plt.yscale('log');
output:
Upvotes: 1
Reputation: 56
Taken from https://matplotlib.org/3.1.1/gallery/statistics/hist.html :
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import colors
from matplotlib.ticker import PercentFormatter
# Fixing random state for reproducibility
np.random.seed(19680801)
N_points = 100000
n_bins = 20
# Generate a normal distribution, center at x=0 and y=5
x = np.random.randn(N_points)
y = .4 * x + np.random.randn(100000) + 5
fig, axs = plt.subplots(1, 2, sharey=True, tight_layout=True)
# We can set the number of bins with the `bins` kwarg
axs[0].hist(x, bins=n_bins)
axs[1].hist(y, bins=n_bins)
you can adjust the number of bins to fit your data. Please upload your data, so we can help in more detail.
Upvotes: 0