Reputation: 777
I have a pandas DataFrame
called df
, sorted in chronological order. Each row is a visit on a website.
df
has a column named display
that indicates the number of times a specific page has been displayed during the visit. This column is populated by integers, 0 or greater.
df
also has a user
column.
I want to know how many times each user visited the site before ever seeing the business-critical page I'm interested in.
To know that, I need a user-indexed Series
populated as follows:
display
is non-zero (meaning, the first visit where the user saw the page)Upvotes: 0
Views: 4966
Reputation: 375415
I think it's easier to use plain ol' argmax:
In [11]: df = pd.DataFrame([[1, 0], [1, 0], [1, 1], [2, 0], [2, 1]], columns=['user', 'display'])
In [12]: df
Out[12]:
user display
0 1 0
1 1 0
2 1 1
3 2 0
4 2 1
In [13]: df.groupby('user')['display'].apply(lambda x: np.argmax(x.values))
Out[13]:
user
1 2
2 1
Name: display, dtype: int64
Although, for the sake of clarity (or if display wasn't boolean) I would define a new column:
In [21]: df['seen'] = df['display'] > 0
In [22]: df.groupby('user')['seen'].apply(lambda x: np.argmax(x.values))
Out[22]:
user
1 2
2 1
Name: seen, dtype: int64
Note: my old answer said df.groupby('user')['display'].apply(np.argmax)
which wasn't quite correct as this gave the first True index.
Upvotes: 2
Reputation: 777
df.groupby('user').display.apply(nvisits_before_display)
import numpy as np
def nvisits_before_display(x):
try:
return np.where(x > 0)[0].item(0) + 1
except IndexError:
return 0
What does this mean?
x > 0
, when applied to the column display
, means that the page has been displayed on a given visitnp.where(<condition>)[0]
returns a numpy.ndarray
containing the positions of the index (ordered integers) where the condition is metitem(0)
is about taking the first of these positions, meaning the first visit where the page has been displayed+ 1
stands for setting value 1 to users who saw the page on their first visitgroupby('user')
applys the nvisits_before_display
function to the rows belonging to each userUpvotes: 2