Reputation: 365
Take this df
:
df = pd.DataFrame({'client_id':[0, 0, 0, 1, 1, 1, 2, 2, 2],
'key':['0_382','0_382','0_356','1_365',float('nan'),'1_365',float('nan'),'2_284','2_405'],
'operation':['buy','sell','sell','buy','transfer','buy','fee','buy','buy']})
client_id key operation
0 0 0_382 buy
1 0 0_382 sell
2 0 0_356 sell
3 1 1_365 buy
4 1 NaN transfer
5 1 1_365 buy
6 2 NaN fee
7 2 2_284 buy
8 2 2_405 buy
I need to create a column named pos_id
that will give an incremental value (1,2,3...) for each row, for unique values of client_id
and key
, and using a conditional to skip transfer
and fee
values of operation
.
The result should be like this:
client_id key operation pos_id
0 0 0_382 buy 1
1 0 0_382 sell 1
2 0 0_356 sell 2
3 1 1_365 buy 1
4 1 NaN transfer NaN
5 1 1_365 buy 1
6 2 NaN fee NaN
7 2 2_284 buy 1
8 2 2_405 buy 2
Upvotes: 1
Views: 62
Reputation: 59579
Here are two ways.
The first method groups ['client_id', 'key']
to the same 'pos_id'
within 'client_id'
regardless of whether or not they appear consecutively.
Use where
to mask the rows you want to ignore and then groupby
+ ngroup
with sort=False
will count the unique combinations. Then subtract off the min within each group to get the counter starting at 1.
s = (df.where(~df['operation'].isin(['transfer', 'fee']))
.groupby(['client_id', 'key'], sort=False).ngroup()
.replace(-1, np.NaN)) # ngroup makes NaN group keys -1.
df['pos_id'] = s - s.groupby(df['client_id']).transform('min') + 1
This method requires the input sorted on at least 'client_id'
and then will only group the same key if they are consecutive into the same 'pos_id'
. Remove the rows you want to ignore, then check for differences in each row and take the cumsum
within 'client_id'
s = (df.where(~df['operation'].isin(['transfer', 'fee']))
.dropna(how='all'))
s = s['key'].ne(s['key'].shift()) | s['client_id'].ne(s['client_id'].shift())
df['pos_id'] = s.groupby(df['client_id']).cumsum()
For your input either results in:
client_id key operation pos_id
0 0 0_382 buy 1.0
1 0 0_382 sell 1.0
2 0 0_356 sell 2.0
3 1 1_365 buy 1.0
4 1 NaN transfer NaN
5 1 1_365 buy 1.0
6 2 NaN fee NaN
7 2 2_284 buy 1.0
8 2 2_405 buy 2.0
Upvotes: 1