papabiceps
papabiceps

Reputation: 1058

Julia: How to make the histogram have same number of bins for two vectors of equal size?

I want to calculate frequency of occurrence in multiple vectors and want the resulting number of bins to be consistent across vectors so its easier to calculate wasserstein distance among them.

The following code shows that histogram gives different sized bins.

using StatsBase

for i in 1:10
    h = fit(Histogram,randn(1000), nbins=10); println(size(h.weights))
end

How to make number of bins consistent?

Upvotes: 4

Views: 485

Answers (1)

tholy
tholy

Reputation: 12179

One way to be completely consistent across runs is to supply more than just the number of bins; to be perfectly consistent, we also supply their exact positions. With Julia's StatsBase, you do that by supplying the "edges" (bin boundaries). Here's a demo where bins run from i to i+1:

julia> fit(Histogram, randn(1000), -5:5)
Histogram{Int64, 1, Tuple{UnitRange{Int64}}}
edges:
  -5:5
weights: [0, 2, 23, 139, 319, 355, 143, 18, 1, 0]
closed: left
isdensity: false

Upvotes: 3

Related Questions