Reputation: 15
I tried different ways to slice Panda column strings up to a specific character based on a condition.
For example, consider the Kaggle's Titanic data set where I would like to slice all names in column "Name" up to '(' character in case that they include that character so that there exist no brackets in names and names only include the characters before the beginning of the bracket. So you can think of it as getting rid of the brackets to stay with what was before the bracket.
I used this way:
df.loc[df['Name'].str.rfind('(') > -1, 'Name'] = df['Name'].str.slice(0, df['Name'].str.rfind('('))
which essentially when finds a name which contains '(' it proceeds into slicing it, otherwise it returns the name (which does not include the opening bracket. The slicing is all about finding and take the characters before the opening bracket.
My solution does not work since it produces "NaN", how can I fix it?
Upvotes: 1
Views: 4699
Reputation: 59529
You can just use pd.Series.str.split
to get everything before ' ('
.
import pandas as pd
df = pd.DataFrame({'Name': ['Braund, Mr. Owen Harris',
'Cummings, Mrs. John Bradley (Florence Briggs)',
'Heikkinen, Miss. Laina',
'Futrelle, Mrs. Jacques Heath (Lily May Peel)',
'Allen, Mr. William Henry']})
df['Name'] = df.Name.str.split(' \(', expand=True)[0]
Output:
print(df)
Name
0 Braund, Mr. Owen Harris
1 Cummings, Mrs. John Bradley
2 Heikkinen, Miss. Laina
3 Futrelle, Mrs. Jacques Heath
4 Allen, Mr. William Henry
Upvotes: 3