Python Pandas slice column string up to a character based on condition

Question

I tried different ways to slice Panda column strings up to a specific character based on a condition.

For example, consider the Kaggle's Titanic data set where I would like to slice all names in column "Name" up to '(' character in case that they include that character so that there exist no brackets in names and names only include the characters before the beginning of the bracket. So you can think of it as getting rid of the brackets to stay with what was before the bracket.

Sample of my data set

I used this way:

df.loc[df['Name'].str.rfind('(') > -1, 'Name'] = df['Name'].str.slice(0, df['Name'].str.rfind('('))

which essentially when finds a name which contains '(' it proceeds into slicing it, otherwise it returns the name (which does not include the opening bracket. The slicing is all about finding and take the characters before the opening bracket.

My solution does not work since it produces "NaN", how can I fix it?

ALollz · Accepted Answer

You can just use pd.Series.str.split to get everything before ' ('.

import pandas as pd

df = pd.DataFrame({'Name': ['Braund, Mr. Owen Harris',
                           'Cummings, Mrs. John Bradley (Florence Briggs)',
                           'Heikkinen, Miss. Laina',
                           'Futrelle, Mrs. Jacques Heath (Lily May Peel)',
                           'Allen, Mr. William Henry']})

df['Name'] = df.Name.str.split(' \(', expand=True)[0]

Output:

print(df)
                           Name
0       Braund, Mr. Owen Harris
1   Cummings, Mrs. John Bradley
2        Heikkinen, Miss. Laina
3  Futrelle, Mrs. Jacques Heath
4      Allen, Mr. William Henry

Python Pandas slice column string up to a character based on condition

Answers (1)

Related Questions