Chris Morgan
Chris Morgan

Reputation: 21

Split data between values of a row

I am looking to categorise some data I have. The data set contains the changelog behind a media campaign. I would like to have a Set-Up, In-Flight and Post-Flight

Example below:

Date of change | Change type
01/01/2021  Settings Change (4393)
02/01/2021  Settings Change (7490)
05/01/2021  Campaign set live
06/01/2021  Settings Change (6524)
07/01/2021  Settings Change (1321)
08/01/2021  Campaign paused
09/01/2021  Settings Change (8822)
11/01/2021  Settings Change (5891)
12/01/2021  Campaign Set live
13/01/2021  Settings Change (4669)
14/01/2021  Settings Change (2287)
15/01/2021  Campaign ends
16/01/2021  Settings Change (5649)

I would like to see something like:

Date of change | Change | Stage of change
01/01/2021  Settings Change (4393) Pre-Flight
02/01/2021  Settings Change (7490) Pre-Flight
05/01/2021  Campaign live          Campaign live
06/01/2021  Settings Change (6524) In-Flight
07/01/2021  Settings Change (1321) In-Flight
08/01/2021  Campaign paused        Campaign paused
09/01/2021  Settings Change (8822) In-Flight (Paused)
11/01/2021  Settings Change (5891) In-Flight (Paused)
12/01/2021  Campaign live          Campaign live
13/01/2021  Settings Change (4669) In-Flight
14/01/2021  Settings Change (2287) In-Flight
15/01/2021  Campaign ends          Campaign ends
16/01/2021  Settings Change (5649) Post-Flight

I have tried splitting based on a value but this is before or after a certain value or based on something like a date, not on the Change Type.

Any help is hugely appreciated.

Upvotes: 1

Views: 73

Answers (1)

Cimbali
Cimbali

Reputation: 11395

Let’s first normalize your Change column from your Change type row, in particular the Campaign [Ss]et live rows:

>>> df['Change'] = df['Change type'].str.replace('Campaign set live', 'Campaign live', flags=re.IGNORECASE)
>>> df = df.drop(columns=['Change type'])
>>> df
   Date of change                  Change
0      01/01/2021  Settings Change (4393)
1      02/01/2021  Settings Change (7490)
2      05/01/2021           Campaign live
3      06/01/2021  Settings Change (6524)
4      07/01/2021  Settings Change (1321)
5      08/01/2021         Campaign paused
6      09/01/2021  Settings Change (8822)
7      11/01/2021  Settings Change (5891)
8      12/01/2021           Campaign live
9      13/01/2021  Settings Change (4669)
10     14/01/2021  Settings Change (2287)
11     15/01/2021           Campaign ends
12     16/01/2021  Settings Change (5649)

Then we basically want to keep the rows that define campaign states as-is, and fill them based on the previous values.

We can use the same dictionary for both things, respectively

  1. defining the fill values (that we can then propagate with fillna), and
  2. masking the rows where we don’t want this filled value to reset it to the origina Change column:
>>> state_after_change = {
...     'Campaign live': 'In-Flight',
...     'Campaign paused': 'In-Flight (Paused)',
...     'Campaign ends': 'Post-Flight'
... }
>>> df['State of change'] = df['Change'].map(state_after_change).ffill().fillna('Pre-flight')
>>> df['State of change']
0             Pre-flight
1             Pre-flight
2              In-Flight
3              In-Flight
4              In-Flight
5     In-Flight (Paused)
6     In-Flight (Paused)
7     In-Flight (Paused)
8              In-Flight
9              In-Flight
10             In-Flight
11           Post-Flight
12           Post-Flight
Name: State of change, dtype: object
>>> df['State of change'] = df['State of change'].mask(df['Change'].isin(state_after_change), df['Change'])
>>> df
   Date of change                  Change     State of change
0      01/01/2021  Settings Change (4393)          Pre-flight
1      02/01/2021  Settings Change (7490)          Pre-flight
2      05/01/2021           Campaign live       Campaign live
3      06/01/2021  Settings Change (6524)           In-Flight
4      07/01/2021  Settings Change (1321)           In-Flight
5      08/01/2021         Campaign paused     Campaign paused
6      09/01/2021  Settings Change (8822)  In-Flight (Paused)
7      11/01/2021  Settings Change (5891)  In-Flight (Paused)
8      12/01/2021           Campaign live       Campaign live
9      13/01/2021  Settings Change (4669)           In-Flight
10     14/01/2021  Settings Change (2287)           In-Flight
11     15/01/2021           Campaign ends       Campaign ends
12     16/01/2021  Settings Change (5649)         Post-Flight

Upvotes: 2

Related Questions