programmer987
programmer987

Reputation: 119

Splitting string twice and getting only specific elements

The following dataframe shows the purchases (conversions) generated by each marketing campaign path. For example, 10 users purchased a product after clicking on the ads of Campaigns A and B:

|                      path                          |  conversions | 
                      ------                             ---------        
| Campaign A | Laptop | Dec,Campaign B | Jan  |Phone |     10       | 
| Campaign C | Aug    | Camera,Campaign D2022 | Game |     35       |

The expected output is the following:

|  product    |  conversions | 
   ------         ---------        
| Laptop,Phone|     10       | 
| Camera,Game |     35       |

I only want to display the products that the ad campaigns refer to and its corresponding conversions. Please note that the position of the products' names may differ and that some campaigns do not have the same n° of elements (e.g. Campaign D2022).

Also, the order of the interactions with the campaigns is relevant: Campaign A, Campaign B != Campaign B, Campaign A.

I tried to solve it like this, but it did not work:

df['path']=df['path'].replace(' | ',',').str.split() 

journey=df['path']

items=['Laptop','Phone','Camera','Game']

df['product']=[x for x in journey if x in items]

After running the code above, I received an error message:

Length of values (0) does not match length of index (80566)

Upvotes: 0

Views: 63

Answers (1)

Tim Roberts
Tim Roberts

Reputation: 54698

This is untested, but this is the kind of thing you need:

def get_products(path):
    return ','.join( part.split(' | ')[1] for part in path.split(',') )
...
df['product'] = df['path'].apply(get_products)

Upvotes: 1

Related Questions