BWO
BWO

Reputation: 115

Use python regex to split a string column to create new columns of a data frame in Python

I need to extract the last part of a string within a series to create new series(columns) in my dataframe. Below is the string column example

Joke_identifier
Klint De Drunk Enugu 1
Klint De Drunk Enugu 2
Klint De Drunk Enugu 3
Gordons Lagos 1
Gordons Lagos 2

I tried using this line of code to extract the last part of the string

import re

train_data.Joke_identifier.str.split(r"[a-zA-Z]\s[0-9]\Z", expand=True)

But it returns Klint De Drunk Enug

I would get a new data frame as shown below

Joke_identifier Joker City Edition
Klint De Drunk Enugu 1 Klint De Drunk Enugu 1
Klint De Drunk Enugu 2 Klint De Drunk Enugu 2
Klint De Drunk Enugu 3 Klint De Drunk Enugu 3
Gordons Lagos 1 Gordons Lagos 1
Gordons Lagos 2 Gordons Lagos 2

Please help

Upvotes: 2

Views: 1145

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626802

You can use Series.str.extract:

import pandas as pd
df = pd.DataFrame({'Joke_identifier':['Klint De Drunk Enugu 1', 'Klint De Drunk Enugu 2', 'Klint De Drunk Enugu 3', 'Gordons Lagos 1','Gordons Lagos 2']})
df[['Joker','City','Edition']] = df['Joke_identifier'].str.extract(r"^(.*)\s+(\S+)\s+(\d+)$", expand=True)
df
#            Joke_identifier           Joker   City Edition
#  0  Klint De Drunk Enugu 1  Klint De Drunk  Enugu       1
#  1  Klint De Drunk Enugu 2  Klint De Drunk  Enugu       2
#  2  Klint De Drunk Enugu 3  Klint De Drunk  Enugu       3
#  3         Gordons Lagos 1         Gordons  Lagos       1
#  4         Gordons Lagos 2         Gordons  Lagos       2

See the regex demo.

The pattern matches

  • ^ - start of string
  • (.*) - Capturing group 1 (Column "Joker"): any zero or more chars other than line break chars, as many as possible
  • \s+ - one or more whitespaces
  • (\S+) - Capturing group 2 (Column "City"): any one or more whitespaces
  • \s+ - one or more whitespaces
  • (\d+) - Capturing group 3 (Column "Edition"): one or more digits
  • $ - end of string.

Upvotes: 1

Related Questions