Reputation: 29
I want to count the values per bin and have it populate a data frame.
a = smry_dmo.loc['Mean', 'Income']
b = smry_dmo.loc['Standard Deviation', 'Income']
plt.hist(dmo_df.Income, 10, color = 'magenta', edgecolor = 'black')
plt.title(f'Distribution of Income: $\mu= {a}$, $sigma={b}$')
plt.xlabel('Income')
plt.ylabel('Frequency')
plt.show()
Let me know if what I'm asking isn't clear.
Thank you.
Upvotes: 0
Views: 1417
Reputation: 7903
plt.hist
returns a tuple with : (n, bins, patches). You just need to capture them so you have access afterwards.
n, bins, patches = plt.hist(dmo_df.Income, 10, color = 'magenta', edgecolor = 'black')
I made a little example to show you how it looks like.
# x = np.random.randint(1,20, size=20)
x = np.array([10, 18, 6, 13, 2, 18, 5, 13, 13, 5, 11, 18, 1, 7, 8, 10, 12, 9, 17, 2])
n, bins, patches = plt.hist(x, bins=5, color = 'magenta', edgecolor = 'black')
plt.show()
print(n)
[3. 4. 5. 4. 4.]
print(bins)
[ 1. 4.4 7.8 11.2 14.6 18. ]
Referring to this answer you can do it with numpy
and get arrays of each bin with the values of your data:
binlist = np.c_[bins[:-1],bins[1:]]
d = np.array(x)
for i in range(len(binlist)):
if i == len(binlist)-1:
l = d[(d >= binlist[i,0]) & (d <= binlist[i,1])]
else:
l = d[(d >= binlist[i,0]) & (d < binlist[i,1])]
print(l)
Output:
[2 1 2]
[6 5 5 7]
[10 11 8 10 9]
[13 13 13 12]
[18 18 18 17]
Not sure if that is a good solution but I thought if you want to have a DataFrame and just the ranges and counts you could do it like this:
df1 = pd.DataFrame({
'bin_index' : list(range(len(n))),
'counts': n,
'left_bin_limit': bins[:-1],
'right_bin_limit': bins[1:],
})
print(df1)
bin_index counts left_bin_limit right_bin_limit
0 0 3.0 1.0 4.4
1 1 4.0 4.4 7.8
2 2 5.0 7.8 11.2
3 3 4.0 11.2 14.6
4 4 4.0 14.6 18.0
Upvotes: 1