Comparing first and last row of groupby and creating new value

Question

I have a dataframe with multiple values and would like to groupby the 'email' column, retrieve the first row and last row, and compare to see if there is a change in the status of the category column. For example, if the category is from MGR to MGR, then there is no change. If the category changes from EMP to MGR, then it reflects a change in status.

date                 email               category
13-04-2018            johnson@abc.com     MGR
13-04-2018            linsay@abc.com      EMP
18-04-2018            kelphil@abc.com     EMP
20-04-2018            rsling@abc.com      MGR
11-01-2019            johnson@abc.com     MGR
15-10-2019            johnson@abc.com     MGR
16-11-2019            kelphil@abc.com     MGR
31-01-2020            sanson@abc.com      EMP
02-05-2020            rsling@abc.com      MGR
05-08-2020            rsling@abc.com      MGR
14-02-2021            sanson@abc.com      MGR
15-02-2021            linsay@abc.com      MGR

Would like to get the following results

date                 email               category    status
13-04-2018            johnson@abc.com     MGR        no change
15-10-2019            johnson@abc.com     MGR        no change
13-04-2018            linsay@abc.com      EMP        change
15-02-2021            linsay@abc.com      MGR        change
18-04-2018            kelphil@abc.com     EMP        change 
16-11-2019            kelphil@abc.com     MGR        change 
20-04-2018            rsling@abc.com      MGR        no change
05-08-2020            rsling@abc.com      MGR        no change
31-01-2020            sanson@abc.com      EMP        change 
14-02-2021            sanson@abc.com      MGR        change

I've tried the following code, but it seems to only retrieve the first and last rows based on the groupby. Is there some method to compare the values between the first and last row?

#get the first and last row of the groupby
df2 = df.groupby('email', as_index=False).nth([0,-1])

appreciate any form of help, thank you.

Zebartin · Accepted Answer

Not sure if it is effecient enough, but it works well.

def check_status(group):
    selected = [False] * len(group)
    selected[0] = selected[-1] = True
    new_group = group[selected]
    new_group['status'] = 'change' if new_group.category.is_unique else 'no change'
    return new_group

print(df.groupby('email').apply(check_status).reset_index(drop=True))

Comparing first and last row of groupby and creating new value

Answers (2)

Related Questions

	date	email	category	status
0	13-04-2018	[email protected]	MGR	No Change
5	15-10-2019	[email protected]	MGR	No Change
2	18-04-2018	[email protected]	EMP	Change
6	16-11-2019	[email protected]	MGR	Change
1	13-04-2018	[email protected]	EMP	Change
11	15-02-2021	[email protected]	MGR	Change
3	20-04-2018	[email protected]	MGR	No Change
9	05-08-2020	[email protected]	MGR	No Change
7	31-01-2020	[email protected]	EMP	Change
10	14-02-2021	[email protected]	MGR	Change