mikarific
mikarific

Reputation: 13

Converting float into range in Python

I am doing some data analysis with pandas and am struggling to find a nice, clean way of summing up a range of numbers. I have a data frame with a column of floats, however I am not interested in the exact number, but a rough range. Ultimately I want to run a pivot and count how many values are in a certain range. Therefore ideally I would want to create a new column in my data frame, that converts my column of floats into a range. Say df[number] = 3.5, then df[range] = 0-10

The ranges should be 0-10, 10-20, ... >100

This may sound very arbitrary, but I've been struggling to find an answer on this. Many thanks

Upvotes: 0

Views: 2286

Answers (3)

TomAugspurger
TomAugspurger

Reputation: 28956

Pandas has a cut function for this

In [18]: s = pd.Series(np.random.uniform(0, 110, 100))

In [19]: s
Out[19]:
0     57.614427
1     30.576853
2     95.578943
3     53.010340
4     63.947381
        ...
95    42.252644
96    14.814418
97    81.271527
98     5.732966
99    90.932890


In [12]: s = pd.Series(np.random.uniform(0, 110, 100))

In [13]: s
Out[13]:
0      2.652461
1     46.536276
2      6.455352
3      6.075963
4     40.013378
        ...
95    39.775493
96    99.688307
97    41.064469
98    91.401904
99    60.580600
dtype: float64

In [14]: cuts = np.arange(0, 101, 10)

In [15]: pd.cut(s, cuts)
Out[15]:
0       (0, 10]
1      (40, 50]
2       (0, 10]
3       (0, 10]
4      (40, 50]
        ...
95     (30, 40]
96    (90, 100]
97     (40, 50]
98    (90, 100]
99     (60, 70]
dtype: category
Categories (10, object): [(0, 10] < (10, 20] < (20, 30] < (30, 40] ... (60, 70] < (70, 80] < (80, 90] <
                          (90, 100]]

See the docs for controlling what happens with endpoints.

Note that in 0.18 (coming out soonish) the result will be an IntervalIndex instead of a Categorical, which will make things even nicer.

To get your counts per interval, use the value_counts method

In [17]: pd.cut(s, cuts).value_counts()
Out[17]:
(30, 40]     15
(40, 50]     13
(50, 60]     12
(60, 70]     10
(0, 10]      10
(90, 100]     8
(70, 80]      8
(80, 90]      7
(10, 20]      6
(20, 30]      3
dtype: int64

Upvotes: 1

Zachary Cross
Zachary Cross

Reputation: 2318

Using the properties of integer division should help. Because you want ranges in units of 10, dividing a number by 10 (13.5 / 10 == 1.35), converting it to an integer (int(1.35) == 1), and then multiplying by 10 (1 * 10 == 10) will convert the number to the low-end of the range it falls into. This might need some refinement (especially for negative numbers), but you could try something like:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({'vals': [3.5, 4.2, 10.5, 19.5, 20.3, 24.2]})
>>> df
   vals
0   3.5
1   4.2
2  10.5
3  19.5
4  20.3
5  24.2
>>> df['range_start'] = np.floor(df['vals'] / 10) * 10
>>> df
   vals  range_start
0   3.5            0
1   4.2            0
2  10.5           10
3  19.5           10
4  20.3           20
5  24.2           20

Upvotes: 0

Andrea Corbellini
Andrea Corbellini

Reputation: 17771

def get_range_for(x, start=0, stop=100, step=10):
    if x < start:
        return (float('-inf'), start)
    if x >= stop:
        return (stop, float('inf'))
    left = step * ((x - start) // step)
    right = left + step
    return (left, right)

Examples:

>>> get_range_for(3.5)
(0.0, 10.0)
>>> get_range_for(27.3)
(20.0, 30.0)
>>> get_range_for(75.6)
(70.0, 80.0)

Corner cases:

>>> get_range_for(-100)
(-inf, 0)
>>> get_range_for(1234)
(100, inf)
>>> get_range_for(0)
(0, 10)
>>> get_range_for(10)
(10, 20)

Upvotes: 0

Related Questions