Creating a counter based on repetition of a value in dataframe

Question

I have a dataframe like this:

ID   day purchase
ID1  1   10  
ID1  2   15 
ID1  4   13
ID2  2   11
ID2  4   11
ID2  5   24
ID2  6   10

Desired output:

ID   day purchase Txn
ID1  1   10       1
ID1  2   15       2 
ID1  4   13       3 
ID2  2   11       1  
ID2  4   11       2
ID2  5   24       3 
ID2  6   10       4

So for each ID, i want to create a counter to keep a track of their transactions. In SAS, i would do something like First.ID then Txn=1 else Txn+1

How to do something like this in Python?

I got the idea of sorting by ID and day. But how to create customized counter?

sacuL · Accepted Answer

Here is one solution. Like you suggest, it involves sorting by ID and day (in case your original dataframe isn't), and then grouping by ID, creating a counter for each ID:

# Make sure your dataframe is sorted properly (first by ID, then by day)
df = df.sort_values(['ID', 'day'])

# group by ID
by_id = df.groupby('ID')

# Make a custom counter using the default index of dataframes (adding 1)
df['txn'] = by_id.apply(lambda x: x.reset_index()).index.get_level_values(1)+1

>>> df
    ID  day  purchase  txn
0  ID1    1        10    1
1  ID1    2        15    2
2  ID1    4        13    3
3  ID2    2        11    1
4  ID2    4        11    2
5  ID2    5        24    3
6  ID2    6        10    4

If your dataframe started out as not properly sorted, you can get back to the original order like this:

df = df.sort_index()

Creating a counter based on repetition of a value in dataframe

Answers (2)

Related Questions