ELI
ELI

Reputation: 369

pandas: How to append the bin values back to the original dataframe

I am new to Pandas and I have a dataframe like below

id    values   
 1       2.1
 2       0.8  
 3       1.0
 4       3.2

And I want to separate the columns "values" into different bins, like bin =2 and add a column "counts" which represents how many rows fall inside the bin, such as:

id     values   counts
 1        2.1       2 (since 2.1 and 3.2 both belong to the bin 2-4)
 2        0.8       2 
 3        1.0       2
 4        3.2       2

I know value_counts function can calculate the frequency, but I dont know how to append them back to the original dataframe.

Any help is much appreciated!

Upvotes: 3

Views: 1641

Answers (2)

piRSquared
piRSquared

Reputation: 294586

Using numpys searchsorted to define bins and bincount to count them.
This should be very fast.

#         This defines the bin edges
#        [1, 2, 3] would have created
#               different bins
#                    v
b = np.searchsorted([2], df['values'].values)
df.assign(counts=np.bincount(b)[b])

   id  values  counts
0   1     2.1       2
1   2     0.8       2
2   3     1.0       2
3   4     3.2       2

  • np.searchsorted identifies where in the first array every element of the second array needs to be placed in order to maintain sortedness.
    • That means:
    • 2.1 needs to go after the 2 which is a position of 1.
    • 0.8 needs to go before the 2 which is a position of 0.
    • 1.0 needs to go before the 2 which is a position of 0.
    • 3.2 needs to go after the 2 which is a position of 1.
  • np.bincount conveniently counts the frequency of integral bins... like the ones we just created.
  • By slicing the counted bins by the occurrence of the bins, we get a transform like count

Upvotes: 5

Scott Boston
Scott Boston

Reputation: 153560

Let' use pd.cut and groupby:

For two bins:

df.assign(counts=df.groupby(pd.cut(df['values'], bins=2))['values'].transform('count'))

Or if you want your bin size = 2:

df.assign(counts=df.groupby(pd.cut(df['values'], bins=[0,2,4]))['values'].transform('count'))

Output:

   id  values  counts
0   1     2.1     2.0
1   2     0.8     2.0
2   3     1.0     2.0
3   4     3.2     2.0

Upvotes: 3

Related Questions