Reputation: 435
Agent Amount repeat_count
A20 50 USD
A30 70 USD
A60 80 USD
A30 70 USD 1
A20 57 USD
A20 50 USD
SO, above is a small sample of my dataframe. I need to count sequential duplicates i.e. those agents which are transferring the same amount as they did in the previous transaction. For instance , A30 is transferring amount 70 USD twice in a row and hence I need to store the count. Even A20 is sending amount 50 USD twice , but between that it is also sending amount 57 USD . So , I do not want to store its count. Thanks in advance.
Upvotes: 0
Views: 488
Reputation: 16251
Let's use a larger example:
Agent Amount
A20 50
A30 70
A60 80
A30 70
A20 57
A20 50
A30 70
A30 80
A30 70
As usual with such a problem, we will use groupby
to work on each agent separately, so we can solve the problem for a single agent first. Let:
df1 = df[df.Agent == 'A30']
The following will find when the same amount is repeated:
df1.Amount.shift() == df1.Amount
So you can count the number of occurrences with a cumsum
:
In [11]: (df1.Amount.shift() == df1.Amount).cumsum()
Out[11]:
1 0
3 1
6 2
7 2
8 2
Let's apply the above solution to the original dataframe:
In [12]: df.groupby('Agent').apply(
lambda df1: (df1.Amount.shift() == df1.Amount).cumsum()
)
Out[12]:
Agent
A20 0 0
4 0
5 0
A30 1 0
3 1
6 2
7 2
8 2
A60 2 0
In order to merge the result with the original dataframe, we need to drop the first level of the index (the agents):
repeat_count = df.groupby('Agent').apply(
lambda df1: (df1.Amount.shift() == df1.Amount).cumsum()
)
pd.concat([df, repeat_count.reset_index(level=0, drop=True)], axis=1)
The function concat
will merge based on index values, so the results in repeat_count
are aligned with the original dataframe.
Upvotes: 1