Reputation: 1393
I have a dataframe like this:
ID day purchase
ID1 1 10
ID1 2 15
ID1 4 13
ID2 2 11
ID2 4 11
ID2 5 24
ID2 6 10
Desired output:
ID day purchase Txn
ID1 1 10 1
ID1 2 15 2
ID1 4 13 3
ID2 2 11 1
ID2 4 11 2
ID2 5 24 3
ID2 6 10 4
So for each ID, i want to create a counter to keep a track of their transactions. In SAS, i would do something like First.ID then Txn=1 else Txn+1
How to do something like this in Python?
I got the idea of sorting by ID and day. But how to create customized counter?
Upvotes: 1
Views: 51
Reputation: 51335
Here is one solution. Like you suggest, it involves sorting by ID and day (in case your original dataframe isn't), and then grouping by ID, creating a counter for each ID:
# Make sure your dataframe is sorted properly (first by ID, then by day)
df = df.sort_values(['ID', 'day'])
# group by ID
by_id = df.groupby('ID')
# Make a custom counter using the default index of dataframes (adding 1)
df['txn'] = by_id.apply(lambda x: x.reset_index()).index.get_level_values(1)+1
>>> df
ID day purchase txn
0 ID1 1 10 1
1 ID1 2 15 2
2 ID1 4 13 3
3 ID2 2 11 1
4 ID2 4 11 2
5 ID2 5 24 3
6 ID2 6 10 4
If your dataframe started out as not properly sorted, you can get back to the original order like this:
df = df.sort_index()
Upvotes: 2
Reputation: 12147
The simplest method I could come up with, definitely not the most efficient though.
df['txn'] = [0]*len(df)
prev_ID = None
for index, row in df.iterrows():
if row['ID'] == prev_ID:
df['txn'][index] = counter
counter += 1
else:
prev_ID = row['ID']
df['txn'][index] = 1
counter = 2
outputs
ID day purchase txn
0 ID1 1 10 1
1 ID1 2 15 2
2 ID1 4 13 3
3 ID2 2 11 1
4 ID2 4 11 2
5 ID2 5 24 3
6 ID2 6 10 4
Upvotes: 0