Splitting string twice and getting only specific elements

Question

The following dataframe shows the purchases (conversions) generated by each marketing campaign path. For example, 10 users purchased a product after clicking on the ads of Campaigns A and B:

|                      path                          |  conversions | 
                      ------                             ---------        
| Campaign A | Laptop | Dec,Campaign B | Jan  |Phone |     10       | 
| Campaign C | Aug    | Camera,Campaign D2022 | Game |     35       |

The expected output is the following:

|  product    |  conversions | 
   ------         ---------        
| Laptop,Phone|     10       | 
| Camera,Game |     35       |

I only want to display the products that the ad campaigns refer to and its corresponding conversions. Please note that the position of the products' names may differ and that some campaigns do not have the same n° of elements (e.g. Campaign D2022).

Also, the order of the interactions with the campaigns is relevant: Campaign A, Campaign B != Campaign B, Campaign A.

I tried to solve it like this, but it did not work:

df['path']=df['path'].replace(' | ',',').str.split() 

journey=df['path']

items=['Laptop','Phone','Camera','Game']

df['product']=[x for x in journey if x in items]

After running the code above, I received an error message:

Length of values (0) does not match length of index (80566)

Splitting string twice and getting only specific elements

Answers (1)

Related Questions