Alexis
Alexis

Reputation: 120

given the scipy.stats.binned_statistic function ... how to work with diferent sizes of bins?

my apologizes if this question exists and was clarified (i have already search and eventually i've not find answers).

so, i have an array of integers a = [1,2,2,2,3,4] and i want to get stats (in this case mean) of each interval using the bins = [0,1.5) ; [1.5,2.5) ; [2.5,5).

As you can see there are NOT same-length intervals. i try to do that:

from scipy.stats import binned_statistic
data = [1,2,2,2,3,4]
bin_means = binned_statistic(data, data, bins=3, range=(0, 5))

in "bin_means" store the following :

BinnedStatisticResult(statistic=array([1.  , 2.25, 4.  ]), bin_edges=array([0.        , 1.66666667, 3.33333333, 5.        ]), binnumber=array([1, 2, 2, 2, 2, 3], dtype=int32))

what i understand? that the bins are: [0,1.66..7) ; [1.66..7,3.33..) ; [3.33...,5) not the intervals i want.
i do not want theese same-length intervals. can someone explainme how can i do that? plus can someone explainme the two main parameters of stats.binned_statics ("x" and "values")? it would be usefull. thanks in advance.

Upvotes: 1

Views: 5339

Answers (1)

ShaharA
ShaharA

Reputation: 903

Basically, as you can see in the documentation, bins can be a list of scalars representing bin edges.

So you can just use:

bin_means = binned_statistic(data, data, bins=[0, 1.5, 2.5, 5], range=(0, 5))

Regarding the values parameter - it is intended to allow you to bin the data, but calculate the statistics on another measure (or multiple measures) relevant to each data point. For example, you could bin people by their height, but calculate the mean statistics of their weight using those bins.

Upvotes: 7

Related Questions