Reputation: 362
Matlab's [n,mapx] = histc(x, bin_edged)
returns the counts of x in each bin as n
and returns a map, which is the same length of x
which is the bin index that each element of x
was placed into.
I can do the same thing in Julia as follows:
Using StatsBase
x = rand(1000)
bin_e = 0:0.1:1
h = fit(Histogram, x, bin_e)
yx = map((z) -> findnext(z.<=h.edges[1],1),x) .- 1
Is this the "right way" to do this? It seem a bit kludgy.
Upvotes: 3
Views: 968
Reputation: 334
I stumbled across this question when I was trying to figure out how many occurrences of each value I had in a list of values. If each value is in its own bin (as for categorical data, or integer data with a small number of unique values), this is what one would be plotting in a histogram.
If that is what you want, then countmap()
in StatBase package is just what you need.
Upvotes: 1
Reputation: 362
After looking through the code for Histogram.jl I found that they already included a function binindex
. So this solution is probably the best:
x = 0:0.001:10
h1 = fit(Histogram,x,0:10,closed=left)
xmap1 = StatsBase.binindex.(Ref(h1), x)
h2 = fit(Histogram,x,0:10,closed=right)
xmap2 = StatsBase.binindex.(Ref(h2), x)
Upvotes: 2
Reputation: 10127
Inspired by this python question you should be able to define a small function that delivers the desired mapping (modulo conventions):
binindices(edges, data) = searchsortedlast.(Ref(edges), data)
Note that the bin edges are sorted and we can use seachsortedlast
to get the last bin edge smaller or equal than a datapoint. Broadcasting this over all of the data we obtain the mapping. Note that the Ref(edges)
indicates that edges
is a scalar under broadcasting (that means that the full array is considered in each call).
Although conceptionally identical to your solution, this approach is about 13x faster on my machine.
I filed an issue over at StatsBase.jl's github page suggesting to add this as a feature.
Upvotes: 4