Reputation: 21
I am looking to categorise some data I have. The data set contains the changelog behind a media campaign. I would like to have a Set-Up, In-Flight and Post-Flight
Example below:
Date of change | Change type
01/01/2021 Settings Change (4393)
02/01/2021 Settings Change (7490)
05/01/2021 Campaign set live
06/01/2021 Settings Change (6524)
07/01/2021 Settings Change (1321)
08/01/2021 Campaign paused
09/01/2021 Settings Change (8822)
11/01/2021 Settings Change (5891)
12/01/2021 Campaign Set live
13/01/2021 Settings Change (4669)
14/01/2021 Settings Change (2287)
15/01/2021 Campaign ends
16/01/2021 Settings Change (5649)
I would like to see something like:
Date of change | Change | Stage of change
01/01/2021 Settings Change (4393) Pre-Flight
02/01/2021 Settings Change (7490) Pre-Flight
05/01/2021 Campaign live Campaign live
06/01/2021 Settings Change (6524) In-Flight
07/01/2021 Settings Change (1321) In-Flight
08/01/2021 Campaign paused Campaign paused
09/01/2021 Settings Change (8822) In-Flight (Paused)
11/01/2021 Settings Change (5891) In-Flight (Paused)
12/01/2021 Campaign live Campaign live
13/01/2021 Settings Change (4669) In-Flight
14/01/2021 Settings Change (2287) In-Flight
15/01/2021 Campaign ends Campaign ends
16/01/2021 Settings Change (5649) Post-Flight
I have tried splitting based on a value but this is before or after a certain value or based on something like a date, not on the Change Type.
Any help is hugely appreciated.
Upvotes: 1
Views: 73
Reputation: 11395
Let’s first normalize your Change
column from your Change type
row, in particular the Campaign [Ss]et live
rows:
>>> df['Change'] = df['Change type'].str.replace('Campaign set live', 'Campaign live', flags=re.IGNORECASE)
>>> df = df.drop(columns=['Change type'])
>>> df
Date of change Change
0 01/01/2021 Settings Change (4393)
1 02/01/2021 Settings Change (7490)
2 05/01/2021 Campaign live
3 06/01/2021 Settings Change (6524)
4 07/01/2021 Settings Change (1321)
5 08/01/2021 Campaign paused
6 09/01/2021 Settings Change (8822)
7 11/01/2021 Settings Change (5891)
8 12/01/2021 Campaign live
9 13/01/2021 Settings Change (4669)
10 14/01/2021 Settings Change (2287)
11 15/01/2021 Campaign ends
12 16/01/2021 Settings Change (5649)
Then we basically want to keep the rows that define campaign states as-is, and fill them based on the previous values.
We can use the same dictionary for both things, respectively
fillna
), andChange
column:>>> state_after_change = {
... 'Campaign live': 'In-Flight',
... 'Campaign paused': 'In-Flight (Paused)',
... 'Campaign ends': 'Post-Flight'
... }
>>> df['State of change'] = df['Change'].map(state_after_change).ffill().fillna('Pre-flight')
>>> df['State of change']
0 Pre-flight
1 Pre-flight
2 In-Flight
3 In-Flight
4 In-Flight
5 In-Flight (Paused)
6 In-Flight (Paused)
7 In-Flight (Paused)
8 In-Flight
9 In-Flight
10 In-Flight
11 Post-Flight
12 Post-Flight
Name: State of change, dtype: object
>>> df['State of change'] = df['State of change'].mask(df['Change'].isin(state_after_change), df['Change'])
>>> df
Date of change Change State of change
0 01/01/2021 Settings Change (4393) Pre-flight
1 02/01/2021 Settings Change (7490) Pre-flight
2 05/01/2021 Campaign live Campaign live
3 06/01/2021 Settings Change (6524) In-Flight
4 07/01/2021 Settings Change (1321) In-Flight
5 08/01/2021 Campaign paused Campaign paused
6 09/01/2021 Settings Change (8822) In-Flight (Paused)
7 11/01/2021 Settings Change (5891) In-Flight (Paused)
8 12/01/2021 Campaign live Campaign live
9 13/01/2021 Settings Change (4669) In-Flight
10 14/01/2021 Settings Change (2287) In-Flight
11 15/01/2021 Campaign ends Campaign ends
12 16/01/2021 Settings Change (5649) Post-Flight
Upvotes: 2