Reputation: 1058
I want to calculate frequency of occurrence in multiple vectors and want the resulting number of bins to be consistent across vectors so its easier to calculate wasserstein distance among them.
The following code shows that histogram gives different sized bins.
using StatsBase
for i in 1:10
h = fit(Histogram,randn(1000), nbins=10); println(size(h.weights))
end
How to make number of bins consistent?
Upvotes: 4
Views: 485
Reputation: 12179
One way to be completely consistent across runs is to supply more than just the number of bins; to be perfectly consistent, we also supply their exact positions. With Julia's StatsBase, you do that by supplying the "edges" (bin boundaries). Here's a demo where bins run from i
to i+1
:
julia> fit(Histogram, randn(1000), -5:5)
Histogram{Int64, 1, Tuple{UnitRange{Int64}}}
edges:
-5:5
weights: [0, 2, 23, 139, 319, 355, 143, 18, 1, 0]
closed: left
isdensity: false
Upvotes: 3