Use python regex to split a string column to create new columns of a data frame in Python

Question

I need to extract the last part of a string within a series to create new series(columns) in my dataframe. Below is the string column example

Joke_identifier
Klint De Drunk Enugu 1
Klint De Drunk Enugu 2
Klint De Drunk Enugu 3
Gordons Lagos 1
Gordons Lagos 2

I tried using this line of code to extract the last part of the string

import re

train_data.Joke_identifier.str.split(r"[a-zA-Z]\s[0-9]\Z", expand=True)

But it returns Klint De Drunk Enug

I would get a new data frame as shown below

Joke_identifier	Joker	City	Edition
Klint De Drunk Enugu 1	Klint De Drunk	Enugu	1
Klint De Drunk Enugu 2	Klint De Drunk	Enugu	2
Klint De Drunk Enugu 3	Klint De Drunk	Enugu	3
Gordons Lagos 1	Gordons	Lagos	1
Gordons Lagos 2	Gordons	Lagos	2

Please help

Wiktor Stribiżew · Accepted Answer

You can use Series.str.extract:

import pandas as pd
df = pd.DataFrame({'Joke_identifier':['Klint De Drunk Enugu 1', 'Klint De Drunk Enugu 2', 'Klint De Drunk Enugu 3', 'Gordons Lagos 1','Gordons Lagos 2']})
df[['Joker','City','Edition']] = df['Joke_identifier'].str.extract(r"^(.*)\s+(\S+)\s+(\d+)$", expand=True)
df
#            Joke_identifier           Joker   City Edition
#  0  Klint De Drunk Enugu 1  Klint De Drunk  Enugu       1
#  1  Klint De Drunk Enugu 2  Klint De Drunk  Enugu       2
#  2  Klint De Drunk Enugu 3  Klint De Drunk  Enugu       3
#  3         Gordons Lagos 1         Gordons  Lagos       1
#  4         Gordons Lagos 2         Gordons  Lagos       2

See the regex demo.

The pattern matches

^ - start of string
(.*) - Capturing group 1 (Column "Joker"): any zero or more chars other than line break chars, as many as possible
\s+ - one or more whitespaces
(\S+) - Capturing group 2 (Column "City"): any one or more whitespaces
\s+ - one or more whitespaces
(\d+) - Capturing group 3 (Column "Edition"): one or more digits
$ - end of string.

Use python regex to split a string column to create new columns of a data frame in Python

Answers (1)

Related Questions