Reputation: 664
I am trying to take a number multiply it by a unique number given which interval it falls within.
I did a groupby on my pandas dataframe according to which bins a value fell into
bins = pd.cut(df['A'], 50)
grouped = df['B'].groupby(bins)
interval_averages = grouped.mean()
A
(0.00548, 0.0209] 0.010970
(0.0209, 0.0357] 0.019546
(0.0357, 0.0504] 0.036205
(0.0504, 0.0651] 0.053656
(0.0651, 0.0798] 0.068580
(0.0798, 0.0946] 0.086754
(0.0946, 0.109] 0.094038
(0.109, 0.124] 0.114710
(0.124, 0.139] 0.136236
(0.139, 0.153] 0.142115
(0.153, 0.168] 0.161752
(0.168, 0.183] 0.185066
(0.183, 0.198] 0.205451
I need to be able to check which interval a number falls into, and then multiply it by the average value of the B column for that interval range.
From the docs I know I can use the in keyword to check if a number is in an interval, but I cannot find how to access the value for a given interval. In addition, I don't want to have to loop through the Series checking if the number is in each interval, that seems quite slow.
Does anybody know how to do this efficiently?
Thanks a lot.
Upvotes: 0
Views: 130
Reputation: 145
You can store the numbers being tested in an array, and use the cut() method with your bins to sort the values into their respective intervals. This will return an array with the bins that each number has fallen into. You can use this array to determine where the value in the dataframe (the mean) that you need to access is located (you will know the correct row) and access the value via iloc.
Hopefully this helps a bit
Upvotes: 1