Reputation: 73
I am trying to define my own binning and calculate the mean value of some other columns of my dataframe over these bins. Unfortunately, it only works with integer inputs as you can see below. In this particular case "step_size" defines the step of one bin and I would like to use float values like 0.109 which corresponds to 0.109 seconds. Do you have any idea how I can do this? I think the problem is in the definition of "create_bins" but I cannot fix it...
The goal should be to get this: [(0,0.109),(0.109,0,218),(0.218,0.327) ......]
Greets
# =============================================================================
# Define parameters
# =============================================================================
seconds_min = 0
seconds_max = 9000
step_size = 1
bin_number = int((seconds_max-seconds_min)/step_size)
# =============================================================================
# Define function to create your own individual binning
# lower_bound defines the lowest value of the binning interval
# width defines the width of the binning interval
# quantity defines the number of bins
# =============================================================================
def create_bins(lower_bound, width, quantity):
bins = []
for low in range(lower_bound,
lower_bound + quantity * width + 1, width):
bins.append((low, low+width))
return bins
# =============================================================================
# Create binning list
# =============================================================================
bin_list = create_bins(lower_bound=seconds_min,
width=step_size,
quantity=bin_number)
print(bin_list)
Upvotes: 0
Views: 146
Reputation: 9481
no numpy:
max_bin=100
min_bin=0
step_size=0.109
number_of_bins = int(1+((max_bin-min_bin)/step_size)) # +1 to cover the whole interval
bins= []
for a in range(number_of_bins):
bins.append((a*step_size, (a+1)*step_size))
Upvotes: 0
Reputation: 12503
Here's a simple way to do it using zip
and numpy's arange
. I've put the upper limit at 5, but you can, of course, choose other numbers.
list(zip(np.arange(0, 5, .109), np.arange(.109, 5, .109)))
The result is:
[(0.0, 0.109),
(0.109, 0.218),
(0.218, 0.327),
(0.327, 0.436),
(0.436, 0.545),
(0.545, 0.654),
(0.654, 0.763),
...
Upvotes: 1
Reputation: 1213
The problem lies in the fact that the range
function does not allow for float ranges.
You can use the numeric_range
function in more_itertools
for this:
from more_itertools import numeric_range
seconds_min = 0
seconds_max = 9
step_size = 0.109
bin_number = int((seconds_max-seconds_min)/step_size)
def create_bins(lower_bound, width, quantity):
bins = []
for low in numeric_range(lower_bound,
lower_bound + quantity * width + 1, width):
bins.append((low, low+width))
return bins
bin_list = create_bins(lower_bound=seconds_min,
width=step_size,
quantity=bin_number)
print(bin_list)
# (0.0, 0.109), (0.109, 0.218), (0.218, 0.327) ... ]
Upvotes: 2