Adders
Adders

Reputation: 665

Pandas: Group by column and count values in range in another column and add that count to a new column

I have the following:

import pandas as pd
import numpy as np

df =pd.DataFrame([['url/1', 'url/2', 'url/3', 'url/2', 'url/2'], [1, 20, 6, 12, 8]] ).T  

df.columns = ['page', 'position']  

print(df)

    page position
0  url/1        1
1  url/2       20
2  url/3        6
3  url/2       12
4  url/2        8

Now, I want to count all positions in a range (between 0 and 10) grouped by each page and then the count added to a new column.

I currently have this (from other SO searches):

print(df.groupby('page')['position'].value_counts(bins=[0, 10], sort=False))

page   position      
url/1  (-0.001, 10.0]    1
url/2  (-0.001, 10.0]    1
url/3  (-0.001, 10.0]    1
Name: position, dtype: int64

However, I'm not sure how to apply the count to each row?

Upvotes: 1

Views: 592

Answers (2)

sophocles
sophocles

Reputation: 13821

You could create a binning column displaying whether each position is between 0,10 and then use a pivot_table with aggfunc set to count:

df['threshold'] = np.where(df['position'].between(0,10),'within 10','outside of 10')
df.pivot_table(index='page', columns='threshold', values='position', aggfunc='count',fill_value=0)

prints:

threshold  outside of 10  within 10
page                               
url/1                  0          1
url/2                  2          1
url/3                  0          1

Upvotes: 3

mozway
mozway

Reputation: 260455

You can define more bins and unstack the output:

(df.groupby('page')['position']
   .value_counts(bins=[0, 10, float('inf')], sort=False)
   .unstack('position')
)

output:

position  (-0.001, 10.0]  (10.0, inf]
page                                 
url/1                  1            0
url/2                  1            2
url/3                  1            0

Upvotes: 2

Related Questions