Reputation: 4122
I'm trying to aggregate the mean visits per page made by visitors to a website grouped by their visitor id's and pages they visited.
In the example data below unique_visit
is the individual visits, visitor_id
is who visited, and page
is the page they visited.
I want to compute the following: visitor 009903 visited page 3ghtr three times, page 4ifac once and page 9fgvb once. The mean page visits of visitor 009903 is therefore x. And do this for each visitor.
I'm aiming to return an aggregated DataFrame / Series where column 1 would be the visitor_id and column 2 would be a mean number as an 'int64'.
unique_visit visitor_id page time
6789988 009903 4ifac 07:01
1978678 001068 9fgvb 11:04
7179832 001624 3ghtr 21:22
4567891 001068 4ifac 16:57
2374852 009903 3ghtr 14:39
2179435 001624 4ifac 21:02
3449855 009903 3ghtr 13:23
6789870 009903 9fgvb 09:34
3439455 009903 3ghtr 14:51
Upvotes: 0
Views: 1122
Reputation: 139172
You can first count the number of visits per visitor / per page (with groupby):
In [11]: df.groupby(['visitor_id', 'page'])['unique_visit'].count()
Out[11]:
visitor_id page
1068 4ifac 1
9fgvb 1
1624 3ghtr 1
4ifac 1
9903 3ghtr 3
4ifac 1
9fgvb 1
Name: unique_visit, dtype: int64
From this, you can take the mean for all pages (second level of the index) per visitor:
In [13]: df.groupby(['visitor_id', 'page'])['unique_visit'].count().mean(level=0)
Out[13]:
visitor_id
1068 1.000000
1624 1.000000
9903 1.666667
Name: unique_visit, dtype: float64
Upvotes: 1