I have a df sorted by EmpID and Date . Now I need to update the Reason column of the first occurrence for each EmpID with the value 100. Original: EmpID Date Reason 123 01/01/2022 0 123 01/02/2022 500 123 01/03/2022 0 124 01/01/2022 0 124 01/02/2022 800 124 01/03/2022 0 Result: EmpID Date Reason 123 01/01/2022 100 123 01/02/2022 500 123 01/03/2022 0 124 01/01/2022 100 124 01/02/2022 800 124 01/03/2022 0 I'm not getting close in any attempts. Would it be easier to update the min date of each EmpID and set Reason to 100? I have tried the following but received an error. DF.loc[DF.groupby(['EmpID','Date'])['Reason'].head(1), 'Reason'] = '100' This raises: KeyError: "None of [Float64Index([ 0.0, 800.0, 0.0, 800.0, 0.0, 800.0, 0.0, 800.0, 0.0,\n 800.0,\n ...\n 800.0, 0.0, 100.0, 800.0, 0.0, 100.0, 800.0, 0.0, 100.0,\n 800.0],\n dtype='float64', name='EEID', length=12769)] are in the [index]" Thank you for any help!

Reputation: 35

Replace the first value of each group

I have a df sorted by EmpID and Date. Now I need to update the Reason column of the first occurrence for each EmpID with the value 100.

Original:

EmpID	Date	Reason
123	01/01/2022	0
123	01/02/2022	500
123	01/03/2022	0
124	01/01/2022	0
124	01/02/2022	800
124	01/03/2022	0

Result:

EmpID	Date	Reason
123	01/01/2022	100
123	01/02/2022	500
123	01/03/2022	0
124	01/01/2022	100
124	01/02/2022	800
124	01/03/2022	0

I'm not getting close in any attempts. Would it be easier to update the min date of each EmpID and set Reason to 100?

I have tried the following but received an error.

DF.loc[DF.groupby(['EmpID','Date'])['Reason'].head(1), 'Reason'] = '100'

This raises:

KeyError: "None of [Float64Index([ 0.0, 800.0, 0.0, 800.0, 0.0, 800.0, 0.0, 800.0, 0.0,\n 800.0,\n ...\n 800.0, 0.0, 100.0, 800.0, 0.0, 100.0, 800.0, 0.0, 100.0,\n 800.0],\n dtype='float64', name='EEID', length=12769)] are in the [index]"

Thank you for any help!

Upvotes: 3

Answers (3)

Noman

Reputation: 163

for solving this problem we need to identify the row number if we get the row number we can filter based on 1st row and change the reason to 100. we can get row number using this function cumcount

data['row_num'] =data.groupby(['EmpID']).cumcount()
data.loc[data.row_num == 0,'Reason'] = 100
data

Upvotes: 0

BENY

Reputation: 323396

Check duplicated

df.loc[~df['EmpID'].duplicated(),'Reason'] = 100

Upvotes: 3

Vladimir Fokow

Reputation: 3883

You can use .transform. Just define a function that returns a mask: with all the values False except for the first one: True.

With the combined mask of all the groups you can then select and replace the elements that you need:

def f(s):
    mask = np.full(len(s), False)
    mask[0] = True
    return mask

mask = df.groupby('EmpID')['Reason'].transform(f)
df.loc[mask, 'Reason'] = 100

Result:

   EmpID       Date  Reason
0    123 2022-01-01     100
1    123 2022-01-02     500
2    123 2022-01-03       0
3    124 2022-01-01     100
4    124 2022-01-02     800
5    124 2022-01-03       0

Upvotes: 0

Replace the first value of each group

Answers (3)

Related Questions