RDJ
RDJ

Reputation: 4122

Pandas: groupby and return mean value of a non-numeric variable

I'm trying to aggregate the mean visits per page made by visitors to a website grouped by their visitor id's and pages they visited.

In the example data below unique_visit is the individual visits, visitor_id is who visited, and page is the page they visited.

I want to compute the following: visitor 009903 visited page 3ghtr three times, page 4ifac once and page 9fgvb once. The mean page visits of visitor 009903 is therefore x. And do this for each visitor.

I'm aiming to return an aggregated DataFrame / Series where column 1 would be the visitor_id and column 2 would be a mean number as an 'int64'.

unique_visit   visitor_id   page    time
6789988        009903       4ifac   07:01
1978678        001068       9fgvb   11:04
7179832        001624       3ghtr   21:22
4567891        001068       4ifac   16:57
2374852        009903       3ghtr   14:39
2179435        001624       4ifac   21:02
3449855        009903       3ghtr   13:23
6789870        009903       9fgvb   09:34
3439455        009903       3ghtr   14:51

Upvotes: 0

Views: 1122

Answers (1)

joris
joris

Reputation: 139172

You can first count the number of visits per visitor / per page (with groupby):

In [11]: df.groupby(['visitor_id', 'page'])['unique_visit'].count()
Out[11]:
visitor_id  page
1068        4ifac    1
            9fgvb    1
1624        3ghtr    1
            4ifac    1
9903        3ghtr    3
            4ifac    1
            9fgvb    1
Name: unique_visit, dtype: int64

From this, you can take the mean for all pages (second level of the index) per visitor:

In [13]: df.groupby(['visitor_id', 'page'])['unique_visit'].count().mean(level=0)
Out[13]:
visitor_id
1068    1.000000
1624    1.000000
9903    1.666667
Name: unique_visit, dtype: float64

Upvotes: 1

Related Questions