Reza Afzalan
Reza Afzalan

Reputation: 5756

Histogram calculation in julia-lang

refer to julia-lang documentations :

hist(v[, n]) → e, counts

Compute the histogram of v, optionally using approximately n bins. The return values are a range e, which correspond to the edges of the bins, and counts containing the number of elements of v in each bin. Note: Julia does not ignore NaN values in the computation.

I choose a sample range of data

testdata=0:1:10;

then use hist function to calculate histogram for 1 to 5 bins

hist(testdata,1) # => (-10.0:10.0:10.0,[1,10])
hist(testdata,2) # => (-5.0:5.0:10.0,[1,5,5])
hist(testdata,3) # => (-5.0:5.0:10.0,[1,5,5])
hist(testdata,4) # => (-5.0:5.0:10.0,[1,5,5])
hist(testdata,5) # => (-2.0:2.0:10.0,[1,2,2,2,2,2])

as you see when I want 1 bin it calculates 2 bins, and when I want 2 bins it calculates 3.

why does this happen?

Upvotes: 7

Views: 5531

Answers (3)

Maciej Fender
Maciej Fender

Reputation: 331

In new versions of Julia, the hist function is not present.

To calculate a histogram, one should use StatsBase.Histogram and StatsBase.fit, e.g.:

    using StatsBase
    h = fit(Histogram, rand(100))
    print(h)

Output:

Histogram{Int64, 1, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}}
edges:
  0.0:0.2:1.0
weights: [21, 22, 17, 16, 24]
closed: left
isdensity: false

Upvotes: 1

Simon Byrne
Simon Byrne

Reputation: 7874

As the person who wrote the underlying function: the aim is to get bin widths that are "nice" in terms of a base-10 counting system (i.e. 10k, 2×10k, 5×10k). If you want more control you can also specify the exact bin edges.

Upvotes: 9

Nils Gudat
Nils Gudat

Reputation: 13800

The key word in the doc is approximate. You can check what hist is actually doing for yourself in Julia's base module here.

When you do hist(test,3), you're actually calling

hist(v::AbstractVector, n::Integer) = hist(v,histrange(v,n))

That is, in a first step the n argument is converted into a FloatRange by the histrange function, the code of which can be found here. As you can see, the calculation of these steps is not entirely straightforward, so you should play around with this function a bit to figure out how it is constructing the range that forms the basis of the histogram.

Upvotes: 5

Related Questions