Jim O.
Jim O.

Reputation: 1111

Cumulative count new values and duplicates by date in python

I want to identify previous visitors to my restaurant. This is my data.

import pandas as  pd
df = pd.DataFrame()
df['date'] = ['2020-01-01', '2020-01-01','2020-01-01',  
              '2020-01-02', '2020-01-02', '2020-01-02', 
              '2020-01-03', '2020-01-03', '2020-01-03',
              '2020-01-04', '2020-01-04', '2020-01-04']
df['value'] = ['Abe', 'Abe', 'Abe', 
               'Abe', 'Ben', 'Abe', 
               'Ben', 'Ben', 'Coco',
               'Abe', 'Abe', 'Abe']
df


    date        value
0   2020-01-01  Abe
1   2020-01-01  Abe
2   2020-01-01  Abe
3   2020-01-02  Abe
4   2020-01-02  Ben
5   2020-01-02  Abe
6   2020-01-03  Ben
7   2020-01-03  Ben
8   2020-01-03  Coco
9   2020-01-04  Abe
10  2020-01-04  Abe
11  2020-01-04  Abe

I want it to look like this:

    date        visitor_total
0   2020-01-01  1
1   2020-01-02  3
2   2020-01-03  5
3   2020-01-04  6

On 2020-01-01, only Abe visited, so the visitor total is 1. On 2020-01-02, Abe and Ben visited, so the total becomes 3. On 2020-01-03, Ben visited twice and Coco visited once, so the total becomes 5. On 2020-01-04, Abe visited three times again, and the total becomes 6.

Thanks in advance!

Upvotes: 0

Views: 42

Answers (1)

vmouffron
vmouffron

Reputation: 428

Try:

df.groupby(["date"])["value"].nunique().cumsum()

Upvotes: 1

Related Questions