mikebmassey
mikebmassey

Reputation: 8604

In pandas, set a new column and update existing column

In a pandas dataframe, I have a last name field that looks like

df = pd.DataFrame(['Jones Jr', 'Smith'], columns=['LastName'])

I am trying to set a new column named 'Generation', while stripping out the Generation for the last name, so the outcome would look like this:

df2 = pd.DataFrame([('Jones', 'Jr'), ('Smith', '')], columns=['LastName', 'Generation'])

I could set the Generation column then go back and remove the Generation from the last name:

df.loc[df['LastName'].str[-3:] == ' Jr', 'Generation'] = 'Jr'
df.loc[df['LastName'].str[-3:] == ' Jr', 'LastName'] = df['LastName'].str[:-3]

However, that's two steps and it seems like performing an update in one swoop would be best.

I thought about apply, but it's an apply to two columns where matching X and I couldn't find anything close to that.

Upvotes: 2

Views: 256

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210982

You can use .str.extract() method:

In [19]: df2 = df.LastName.str.extract(r'(?P<LastName>\w+)\s?(?P<Generation>Jr|Sr)?', expand=True)

In [20]: df2
Out[20]:
  LastName Generation
0    Jones         Jr
1    Smith        NaN

Upvotes: 3

Related Questions