haraujo
haraujo

Reputation: 33

Create a column based of a status in another column

can you help me please?

I have the following table in my dataframe:

DATE CUSTOMER STATUS
01/01/2022 A ACTIVATED
02/01/2022 A ACTIVE
03/01/2022 A INACTIVE
04/01/2022 A EXCEPTION
05/01/2022 A INACTIVATED

I want to mark my dataframe row when my client inactivated (status inactivated). However, I only want to catch when the status at least once has been ACTIVATED. If my client's status is INACTIVATED, but it hasn't passed at least once as ACTIVED, it doesn't matter to me. And also I want to get just only when INACTIVATED is AFTER ACTIVATED.

Could you help me how I can do this in python? I'm having difficulties, here are 2 examples of result I would like to get.

1o option:

DATE CUSTOMER STATUS ACTIVATED/INACTIVATED
01/01/2022 A ACTIVATED 1
02/01/2022 A ACTIVE 1
03/01/2022 A INACTIVE 1
04/01/2022 A EXCEPTION 1
05/01/2022 A INACTIVATED 1

2o option:

DATE CUSTOMER STATUS ACTIVATED/INACTIVATED
01/01/2022 A ACTIVATED
02/01/2022 A ACTIVE
03/01/2022 A INACTIVE
04/01/2022 A EXCEPTION
05/01/2022 A INACTIVATED 1

Thank you very much!

Upvotes: 1

Views: 62

Answers (1)

jezrael
jezrael

Reputation: 863301

You can compare STATUS per groups and test for values s - 1 if exist both, 0 if no:

s = {'ACTIVATED','INACTIVATED'}
df['ACTIVATED/INACTIVATED'] = (df.groupby('CUSTOMER')['STATUS']
                                 .transform(lambda x: set(x) >= s)
                                 .astype(int))
print (df)
         DATE CUSTOMER       STATUS  ACTIVATED/INACTIVATED
0  01/01/2022        A    ACTIVATED                      1
1  02/01/2022        A       ACTIVE                      1
2  03/01/2022        A     INACTIVE                      1
3  04/01/2022        A    EXCEPTION                      1
4  05/01/2022        A  INACTIVATED                      1

EDIT: For test dates with ACTIVATED if less like INACTIVATED create helper columns with aggregate first/last, compare datetimes in eval and last mapping CUSTOMER column:

df['DATE'] = pd.to_datetime(df['DATE'], dayfirst=True)

s = (df.assign(ACTIVATED=df['DATE'].where(df['STATUS'].eq('ACTIVATED')),
               INACTIVATED=df['DATE'].where(df['STATUS'].eq('INACTIVATED')))
       .groupby('CUSTOMER').agg({'ACTIVATED':'first', 'INACTIVATED':'last'})
       .eval('ACTIVATED < INACTIVATED'))

df['ACTIVATED/INACTIVATED'] = df['CUSTOMER'].map(s).astype(int)
print (df)
        DATE CUSTOMER       STATUS  ACTIVATED/INACTIVATED
0 2022-01-01        A    ACTIVATED                      1
1 2022-01-02        A       ACTIVE                      1
2 2022-01-03        A     INACTIVE                      1
3 2022-01-04        A    EXCEPTION                      1
4 2022-01-05        A  INACTIVATED                      1

Upvotes: 1

Related Questions