Reputation: 35
I have a df
sorted by EmpID
and Date
. Now I need to update the Reason
column of the first occurrence for each EmpID
with the value 100.
Original:
EmpID | Date | Reason |
---|---|---|
123 | 01/01/2022 | 0 |
123 | 01/02/2022 | 500 |
123 | 01/03/2022 | 0 |
124 | 01/01/2022 | 0 |
124 | 01/02/2022 | 800 |
124 | 01/03/2022 | 0 |
Result:
EmpID | Date | Reason |
---|---|---|
123 | 01/01/2022 | 100 |
123 | 01/02/2022 | 500 |
123 | 01/03/2022 | 0 |
124 | 01/01/2022 | 100 |
124 | 01/02/2022 | 800 |
124 | 01/03/2022 | 0 |
I'm not getting close in any attempts. Would it be easier to update the min date of each EmpID
and set Reason
to 100?
I have tried the following but received an error.
DF.loc[DF.groupby(['EmpID','Date'])['Reason'].head(1), 'Reason'] = '100'
This raises:
KeyError: "None of [Float64Index([ 0.0, 800.0, 0.0, 800.0, 0.0, 800.0, 0.0, 800.0, 0.0,\n 800.0,\n ...\n 800.0, 0.0, 100.0, 800.0, 0.0, 100.0, 800.0, 0.0, 100.0,\n 800.0],\n dtype='float64', name='EEID', length=12769)] are in the [index]"
Thank you for any help!
Upvotes: 3
Views: 795
Reputation: 163
for solving this problem we need to identify the row number if we get the row number we can filter based on 1st row and change the reason to 100. we can get row number using this function cumcount
data['row_num'] =data.groupby(['EmpID']).cumcount()
data.loc[data.row_num == 0,'Reason'] = 100
data
Upvotes: 0
Reputation: 3883
You can use .transform
. Just define a function that returns a mask: with all the values False
except for the first one: True
.
With the combined mask of all the groups you can then select and replace the elements that you need:
def f(s):
mask = np.full(len(s), False)
mask[0] = True
return mask
mask = df.groupby('EmpID')['Reason'].transform(f)
df.loc[mask, 'Reason'] = 100
Result:
EmpID Date Reason
0 123 2022-01-01 100
1 123 2022-01-02 500
2 123 2022-01-03 0
3 124 2022-01-01 100
4 124 2022-01-02 800
5 124 2022-01-03 0
Upvotes: 0