ensynch
ensynch

Reputation: 47

Extract seasons and years from a string column in pandas

I just wondering if there is any other way I can extract the year from a column and assign two new columns to it where one column is for season and one for year?

I tried this method and it seems to work, but only work for year and selected rows:

year = df['premiered'].str.findall('(\d{4})').str.get(0)
df1 = df.assign(year = year.values)

Output:

|premiered||year|
|----------||---|
|Spring 1998||1998|
|Spring 2001||2001|
|Fall 2016||NaN|
|Fall 2016||NaN|

Upvotes: 1

Views: 164

Answers (2)

tdy
tdy

Reputation: 41337

Use Series.str.split with the expand option:

expand: Expand the split strings into separate columns.

df[['season', 'year']] = df['premiered'].str.split(expand=True)

#      premiered  season  year
# 0  Spring 1998  Spring  1998
# 1  Spring 2001  Spring  2001
# 2    Fall 2016    Fall  2016
# 3    Fall 2016    Fall  2016

Or use Series.str.extract with a regex:

  • (\w+) -- capture 1+ word characters
  • \s* -- 0+ whitespaces
  • (\d+) -- capture 1+ digits
df[['season', 'year']] = df['premiered'].str.extract('(\w+)\s*(\d+)')

#      premiered  season  year
# 0  Spring 1998  Spring  1998
# 1  Spring 2001  Spring  2001
# 2    Fall 2016    Fall  2016
# 3    Fall 2016    Fall  2016

Also it would be a good idea to convert the new year column to numeric:

df['year'] = df['year'].astype(int)

Upvotes: 1

ArchAngelPwn
ArchAngelPwn

Reputation: 3046

You could use a split function

data = { 'premiered' : ['Spring 1998', 'Spring 2001', 'Fall 2016', 'Fall 2016']}
df = pd.DataFrame(data)
df['year'] = df['premiered'].apply(lambda x : x.split(' ')[1])
df

Upvotes: 0

Related Questions