Create a column based of a status in another column

Question

can you help me please?

I have the following table in my dataframe:

DATE	CUSTOMER	STATUS
01/01/2022	A	ACTIVATED
02/01/2022	A	ACTIVE
03/01/2022	A	INACTIVE
04/01/2022	A	EXCEPTION
05/01/2022	A	INACTIVATED

I want to mark my dataframe row when my client inactivated (status inactivated). However, I only want to catch when the status at least once has been ACTIVATED. If my client's status is INACTIVATED, but it hasn't passed at least once as ACTIVED, it doesn't matter to me. And also I want to get just only when INACTIVATED is AFTER ACTIVATED.

Could you help me how I can do this in python? I'm having difficulties, here are 2 examples of result I would like to get.

1o option:

DATE	CUSTOMER	STATUS	ACTIVATED/INACTIVATED
01/01/2022	A	ACTIVATED	1
02/01/2022	A	ACTIVE	1
03/01/2022	A	INACTIVE	1
04/01/2022	A	EXCEPTION	1
05/01/2022	A	INACTIVATED	1

2o option:

DATE	CUSTOMER	STATUS	ACTIVATED/INACTIVATED
01/01/2022	A	ACTIVATED
02/01/2022	A	ACTIVE
03/01/2022	A	INACTIVE
04/01/2022	A	EXCEPTION
05/01/2022	A	INACTIVATED	1

Thank you very much!

jezrael · Accepted Answer

You can compare STATUS per groups and test for values s - 1 if exist both, 0 if no:

s = {'ACTIVATED','INACTIVATED'}
df['ACTIVATED/INACTIVATED'] = (df.groupby('CUSTOMER')['STATUS']
                                 .transform(lambda x: set(x) >= s)
                                 .astype(int))
print (df)
         DATE CUSTOMER       STATUS  ACTIVATED/INACTIVATED
0  01/01/2022        A    ACTIVATED                      1
1  02/01/2022        A       ACTIVE                      1
2  03/01/2022        A     INACTIVE                      1
3  04/01/2022        A    EXCEPTION                      1
4  05/01/2022        A  INACTIVATED                      1

EDIT: For test dates with ACTIVATED if less like INACTIVATED create helper columns with aggregate first/last, compare datetimes in eval and last mapping CUSTOMER column:

df['DATE'] = pd.to_datetime(df['DATE'], dayfirst=True)

s = (df.assign(ACTIVATED=df['DATE'].where(df['STATUS'].eq('ACTIVATED')),
               INACTIVATED=df['DATE'].where(df['STATUS'].eq('INACTIVATED')))
       .groupby('CUSTOMER').agg({'ACTIVATED':'first', 'INACTIVATED':'last'})
       .eval('ACTIVATED < INACTIVATED'))

df['ACTIVATED/INACTIVATED'] = df['CUSTOMER'].map(s).astype(int)
print (df)
        DATE CUSTOMER       STATUS  ACTIVATED/INACTIVATED
0 2022-01-01        A    ACTIVATED                      1
1 2022-01-02        A       ACTIVE                      1
2 2022-01-03        A     INACTIVE                      1
3 2022-01-04        A    EXCEPTION                      1
4 2022-01-05        A  INACTIVATED                      1

Create a column based of a status in another column

Answers (1)

Related Questions