Reputation: 679
I have such histogram:
and I have this code that finds the maxima (-21.5 in my case):
from scipy.stats import gaussian_kde
def find_range(column):
kde = gaussian_kde(column)
no_samples = len(column)
samples = np.linspace(column.min(), column.max(), no_samples)
probs = kde.evaluate(samples)
maxima_index = probs.argmax()
maxima = samples[maxima_index]
plt.scatter(samples, probs) #, color='b',linewidths=0.05)
plt.show()
return [maxima]
But I need to find the range of the most dominant values of the histogram (in this histogram for example: -30 : -5). Something like, the value from both sides where it's probability is equal to 20% of the maxima probability.
How can I achieve it? I had tried the following:
t_right = list(filter(lambda tup:np.logical_and(tup[1] > maxima , probs[tup[0]] <= max(probs)*0.2), enumerate(samples)))
but getting many values, I want only one value that cut the curve
Upvotes: 3
Views: 254
Reputation: 679
This is my solution, will be glad to get other ideas:
from scipy.stats import gaussian_kde
def find_range(column):
kde = gaussian_kde(column)
no_samples = len(column)
samples = np.linspace(column.min(), column.max(), no_samples)
probs = kde.evaluate(samples)
maxima_index = probs.argmax()
maxima = samples[maxima_index]
t_right_list = list(filter(lambda tup:np.logical_and(tup[1] > maxima , math.isclose(probs[tup[0]], max(probs)*0.2, abs_tol=0.00001) ), enumerate(samples)))
t_right = np.median(list(zip(*t_right_list))[1])
t_left_list = list(filter(lambda tup:np.logical_and(tup[1] < maxima , math.isclose(probs[tup[0]], max(probs)*0.2, abs_tol=0.00001) ), enumerate(samples)))
t_left = np.median(list(zip(*t_left_list))[1])
plt.scatter(samples, probs) #, color='b',linewidths=0.05)
plt.show()
return [t_left, maxima, t_right]
In case more than one value will be retrieved in t_right/t_left (because of abs_tol param value), then median can be used (in order to get only one value)
Upvotes: 1
Reputation: 356
I'm not sure if that is what you are looking for but I've found this article on Towards data Science code form that article is as follow: Link: https://towardsdatascience.com/take-your-histograms-to-the-next-level-using-matplotlib-5f093ad7b9d3
# Plot
# Plot histogram
avocado.plot(kind = "hist", density = True, alpha = 0.65, bins = 15) # change density to true, because KDE uses density
# Plot KDE
avocado.plot(kind = "kde")
# Quantile lines
quant_5, quant_25, quant_50, quant_75, quant_95 = avocado.quantile(0.05), avocado.quantile(0.25), avocado.quantile(0.5), avocado.quantile(0.75), avocado.quantile(0.95)
quants = [[quant_5, 0.6, 0.16], [quant_25, 0.8, 0.26], [quant_50, 1, 0.36], [quant_75, 0.8, 0.46], [quant_95, 0.6, 0.56]]
for i in quants:
ax.axvline(i[0], alpha = i[1], ymax = i[2], linestyle = ":")
# X
ax.set_xlabel("Average Price ($)")
# Limit x range to 0-4
x_start, x_end = 0, 4
ax.set_xlim(x_start, x_end)
# Y
ax.set_ylim(0, 1)
ax.set_yticklabels([])
ax.set_ylabel("")
# Annotations
ax.text(quant_5-.1, 0.17, "5th", size = 10, alpha = 0.8)
ax.text(quant_25-.13, 0.27, "25th", size = 11, alpha = 0.85)
ax.text(quant_50-.13, 0.37, "50th", size = 12, alpha = 1)
ax.text(quant_75-.13, 0.47, "75th", size = 11, alpha = 0.85)
ax.text(quant_95-.25, 0.57, "95th Percentile", size = 10, alpha =.8)
# Overall
ax.grid(False)
ax.set_title("Avocado Prices in U.S. Markets", size = 17, pad = 10)
# Remove ticks and spines
ax.tick_params(left = False, bottom = False)
for ax, spine in ax.spines.items():
spine.set_visible(False)
plt.show()
The output of above is something like that:
I hope that could be helpful for you! :)
Upvotes: 3