Jonathan Kennedy
Jonathan Kennedy

Reputation: 353

If value contains string, then set another column value

I have a dataframe in Pandas with a column called 'Campaign' it has values like this:

"UK-Sample-Car Rental-Car-Broad-MatchPost"

I need to be able to pull out that the string contains the word 'Car Rental' and set another Product column to be 'CAR'. The hyphen is not always separating out the word Car, so finding the string this way isn't an possible.

How can I achieve this in Pandas/Python?

Upvotes: 2

Views: 2201

Answers (1)

firelynx
firelynx

Reputation: 32224

pandas as some sweet string functions you can use

for example, like this:

df['vehicle'] = df.Campaign.str.extract('(Car).Rental').str.upper()

This sets the column vehicle to what is contained inside the parenthesis of the regular expression given to the extract function.

Also the str.upper makes it uppercase

Extra Bonus:

If you want to assign vehicle something that is not in the original string, you have to take a few more steps, but we still use the string functions This time str.contains .

is_motorcycle = df.Campaign.str.contains('Motorcycle')
df['vehicle'] = pd.Series(["MC"] * len(df)) * is_motorcycle

The second line here creates a series of "MC" strings, then masks it on the entries which we found to be motorcycles.

If you want to combine multiple, I suggest you use the map function:

vehicle_list = df.Campaign.str.extract('(Car).Rental|(Motorcycle)|(Hotel)|(.*)')
vehicle = vehicle_list.apply(lambda x: x[x.last_valid_index()], axis=1)
df['vehicle'] = vehicle.map({'Car':'Car campaign', 'Hotel':'Hotel campaign'})

This first extracts the data into a list of options per line. The cases are split by | and the last one is just a catch-all which is needed for the Series.apply function below. The Series.map function is pretty straight forward, if the captured data is 'Car', we set 'Car campaign', and 'Hotel' we set 'Hotel campaign' etc.

Upvotes: 4

Related Questions