Reputation: 8604
In a pandas
dataframe, I have a last name field that looks like
df = pd.DataFrame(['Jones Jr', 'Smith'], columns=['LastName'])
I am trying to set a new column named 'Generation', while stripping out the Generation for the last name, so the outcome would look like this:
df2 = pd.DataFrame([('Jones', 'Jr'), ('Smith', '')], columns=['LastName', 'Generation'])
I could set the Generation column then go back and remove the Generation from the last name:
df.loc[df['LastName'].str[-3:] == ' Jr', 'Generation'] = 'Jr'
df.loc[df['LastName'].str[-3:] == ' Jr', 'LastName'] = df['LastName'].str[:-3]
However, that's two steps and it seems like performing an update in one swoop would be best.
I thought about apply, but it's an apply to two columns where matching X and I couldn't find anything close to that.
Upvotes: 2
Views: 256
Reputation: 210982
You can use .str.extract() method:
In [19]: df2 = df.LastName.str.extract(r'(?P<LastName>\w+)\s?(?P<Generation>Jr|Sr)?', expand=True)
In [20]: df2
Out[20]:
LastName Generation
0 Jones Jr
1 Smith NaN
Upvotes: 3