Jonny
Jonny

Reputation: 63

Python regular expressions - split a string on some values but not all

I'm trying to build a function to split a list of names.

name_ex = 'Futrelle, Mrs. Jacques Heath (Lily May Peel)'

split_name =re.split('\. |, | ', name_ex)
last_name = split_name[0]
title = split_name[1]
other_names = split_name[2:]

The output when I print split_name is below

['Futrelle', 'Mrs', 'Jacques', 'Heath', '(Lily', 'May', 'Peel)']

However what I want to achieve is:

['Futrelle', 'Mrs', 'Jacques', 'Heath', 'Lily May Peel']

Any idea how I would achieve this?

Additional context - Some names don't have the additional name in brackets - All names are in the order last name, title, first name (middle name optional), bracketed name

Upvotes: 2

Views: 51

Answers (2)

Ajax1234
Ajax1234

Reputation: 71461

You can match groups in parenthesis and then subsequent runs of characters:

import re
name_ex = 'Futrelle, Mrs. Jacques Heath (Lily May Peel)'
new_data = re.findall('(?<=\()[\w\s]+(?=\))|\w+', name_ex)

Output:

['Futrelle', 'Mrs', 'Jacques', 'Heath', 'Lily May Peel']

Upvotes: 0

Rakesh
Rakesh

Reputation: 82785

This should help.

Demo:

import re

name_ex = 'Futrelle, Mrs. Jacques Heath (Lily May Peel)'
m = re.match(r"(?P<lname>[A-Za-z]+), (?P<title>[A-Za-z]+)\. (?P<fname>[A-Za-z]+)(?P<mname>[\sA-Za-z]+)? \((?P<bname>.*?)\)", name_ex)
if m:
    print(m.groups())

Output:

('Futrelle', 'Mrs', 'Jacques', ' Heath', 'Lily May Peel')

Upvotes: 1

Related Questions