Grouping numerical values in pandas

Question

In my Dataframe I have one column with numeric values, let say - distance. I want to find out which group of distance (range) have the biggest number of records (rows).

Doing simple: df.distance.count_values() returns:

74         1
90         1
94         1
893        1
889        1
885        1
877        1
833        1
122        1
545        1

What I want to achieve is something like buckets from histogram, so I am expecting output like this:

900         4 #all values < 900 and > 850
100         3
150         1
550         1
850         1

The one approach I've figured out so far, but I don't think is the best and most optimal one is just find max and min values, divide by my step (50 in this case) and then do loop checking all the values and assigning to appropriate group.

Is there any other, better approach for that?

DrTRD · Accepted Answer

I'd suggest doing the following, assuming your value column is labeled val

import numpy as np
df['bin'] = df['val'].apply(lambda x: 50*np.floor(x/50))

The result is the following:

df.groupby('bin')['val'].count()

Grouping numerical values in pandas

Answers (2)

Related Questions