sebap123
sebap123

Reputation: 2685

Grouping numerical values in pandas

In my Dataframe I have one column with numeric values, let say - distance. I want to find out which group of distance (range) have the biggest number of records (rows).

Doing simple: df.distance.count_values() returns:

74         1
90         1
94         1
893        1
889        1
885        1
877        1
833        1
122        1
545        1

What I want to achieve is something like buckets from histogram, so I am expecting output like this:

900         4 #all values < 900 and > 850
100         3
150         1
550         1
850         1

The one approach I've figured out so far, but I don't think is the best and most optimal one is just find max and min values, divide by my step (50 in this case) and then do loop checking all the values and assigning to appropriate group.

Is there any other, better approach for that?

Upvotes: 4

Views: 5597

Answers (2)

sebap123
sebap123

Reputation: 2685

Thanks to EdChum suggestion and based on this example I've figured out, the best way (at least for me) is to do something like this:

import numpy as np

step = 50

#...

max_val = df.distance.max()
bins = list(range(0,int(np.ceil(max_val/step))*step+step,step))
clusters = pd.cut(df.distance,bins,labels=bins[1:])

Upvotes: 1

DrTRD
DrTRD

Reputation: 1718

I'd suggest doing the following, assuming your value column is labeled val

import numpy as np
df['bin'] = df['val'].apply(lambda x: 50*np.floor(x/50))

The result is the following:

df.groupby('bin')['val'].count()

Upvotes: 3

Related Questions