Reputation: 665
I have the following:
import pandas as pd
import numpy as np
df =pd.DataFrame([['url/1', 'url/2', 'url/3', 'url/2', 'url/2'], [1, 20, 6, 12, 8]] ).T
df.columns = ['page', 'position']
print(df)
page position
0 url/1 1
1 url/2 20
2 url/3 6
3 url/2 12
4 url/2 8
Now, I want to count all positions in a range (between 0 and 10) grouped by each page and then the count added to a new column.
I currently have this (from other SO searches):
print(df.groupby('page')['position'].value_counts(bins=[0, 10], sort=False))
page position
url/1 (-0.001, 10.0] 1
url/2 (-0.001, 10.0] 1
url/3 (-0.001, 10.0] 1
Name: position, dtype: int64
However, I'm not sure how to apply the count to each row?
Upvotes: 1
Views: 592
Reputation: 13821
You could create a binning column displaying whether each position is between 0,10 and then use a pivot_table
with aggfunc
set to count
:
df['threshold'] = np.where(df['position'].between(0,10),'within 10','outside of 10')
df.pivot_table(index='page', columns='threshold', values='position', aggfunc='count',fill_value=0)
prints:
threshold outside of 10 within 10
page
url/1 0 1
url/2 2 1
url/3 0 1
Upvotes: 3
Reputation: 260455
You can define more bins and unstack
the output:
(df.groupby('page')['position']
.value_counts(bins=[0, 10, float('inf')], sort=False)
.unstack('position')
)
output:
position (-0.001, 10.0] (10.0, inf]
page
url/1 1 0
url/2 1 2
url/3 1 0
Upvotes: 2