Reputation: 369
I am new to Pandas and I have a dataframe like below
id values
1 2.1
2 0.8
3 1.0
4 3.2
And I want to separate the columns "values" into different bins, like bin =2 and add a column "counts" which represents how many rows fall inside the bin, such as:
id values counts
1 2.1 2 (since 2.1 and 3.2 both belong to the bin 2-4)
2 0.8 2
3 1.0 2
4 3.2 2
I know value_counts function can calculate the frequency, but I dont know how to append them back to the original dataframe.
Any help is much appreciated!
Upvotes: 3
Views: 1641
Reputation: 294586
Using numpy
s searchsorted
to define bins and bincount
to count them.
This should be very fast.
# This defines the bin edges
# [1, 2, 3] would have created
# different bins
# v
b = np.searchsorted([2], df['values'].values)
df.assign(counts=np.bincount(b)[b])
id values counts
0 1 2.1 2
1 2 0.8 2
2 3 1.0 2
3 4 3.2 2
np.searchsorted
identifies where in the first array every element of the second array needs to be placed in order to maintain sortedness.
2.1
needs to go after the 2
which is a position of 1
.0.8
needs to go before the 2
which is a position of 0
.1.0
needs to go before the 2
which is a position of 0
.3.2
needs to go after the 2
which is a position of 1
.np.bincount
conveniently counts the frequency of integral bins... like the ones we just created.transform
like count
Upvotes: 5
Reputation: 153560
Let' use pd.cut
and groupby
:
For two bins:
df.assign(counts=df.groupby(pd.cut(df['values'], bins=2))['values'].transform('count'))
Or if you want your bin size = 2:
df.assign(counts=df.groupby(pd.cut(df['values'], bins=[0,2,4]))['values'].transform('count'))
Output:
id values counts
0 1 2.1 2.0
1 2 0.8 2.0
2 3 1.0 2.0
3 4 3.2 2.0
Upvotes: 3