alofgran
alofgran

Reputation: 477

Python regex re.sub() is not matching and replacing as expected

The following regex isn't replacing substrings as expected.

I've tried running the code with the following modifications (one at a time, of course) all with no luck:

names is a list of strings

reg_pattern = r"(?!\\s)(\\W[^\\W,]+)(?!,) and\\s([^ ]+ )([^ ]+)"
sub_pattern = r"\\1 \\3 \\2\\3"
cleaned_names = []
cleaned_names = [re.sub(reg_pattern, sub_pattern, name) for name in names]

The goal can be seen in the link above (particularly in the 'substitution' section at the bottom of that page), but ultimately, I need to append group3 of the regex to the end of group1.

Upvotes: 1

Views: 139

Answers (1)

Emma
Emma

Reputation: 27723

I'm guessing that maybe, you're trying to re.sub the couples names, for which you can likely write some expression similar to:

([A-Z][a-z]+)\s+and\s+(.*)([A-Z]\S*)

if you are not having edge cases, if you do then, you'd probably want to modify the char classes, [A-Z], and add those other chars, in there.

Demo

Test

import re

l = ['George Rosario, Ali Jones, Barbara Boll, and Lindsay McKelvoy', 'Jan and Edgar Adelman', 'Bill Mack and Les Lieberman', 'Dr. Susan Muehle-Bussel, Ray Morales, and Dr. Samuel Barker', 'Dan Barroso and Emily High', 'Cassie and George Sorenson', 'Tom Scott and Mark Smith', 'The scene at IDEAL School & Academy’s 10th\xa0Annual Gala.',
     'Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and Ron Iervolino', 'Chuck Grodin', 'Diana Rosario, Ali Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth', 'Kelly and Tom Murro', 'Udo Spreitzenbarth', 'Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton', 'Barbara Loughlin, Dr. Gerald Loughlin, and Debbie Gelston', 'Julianne Michelle']

e = r'([A-Z][a-z]+)\s+and\s+(.*)([A-Z]\S*)'

l_out = []
for names in l:
    if re.match(e, names):
        l_out.append(re.sub(e, r'\1 \3 and \2\3', names))
    else:
        l_out.append(names)

print(l_out)

Output

['George Rosario, Ali Jones, Barbara Boll, and Lindsay McKelvoy', 'Jan Adelman and Edgar Adelman', 'Bill Mack and Les Lieberman', 'Dr. Susan Muehle-Bussel, Ray Morales, and Dr. Samuel Barker', 'Dan Barroso and Emily High', 'Cassie Sorenson and George Sorenson', 'Tom Scott and Mark Smith', 'The scene at IDEAL School & Academy’s 10th\xa0Annual Gala.', 'Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and Ron Iervolino', 'Chuck Grodin', 'Diana Rosario, Ali Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth', 'Kelly Murro and Tom Murro', 'Udo Spreitzenbarth', 'Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton', 'Barbara Loughlin, Dr. Gerald Loughlin, and Debbie Gelston', 'Julianne Michelle']


Or you can try

import re

l = ['George Rosario, Ali Jones, Barbara Boll, and Lindsay McKelvoy', 'Jan and Edgar Adelman', 'Bill Mack and Les Lieberman', 'Dr. Susan Muehle-Bussel, Ray Morales, and Dr. Samuel Barker', 'Dan Barroso and Emily High', 'Cassie and George Sorenson', 'Tom Scott and Mark Smith', 'The scene at IDEAL School & Academy’s 10th\xa0Annual Gala.',
     'Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and Ron Iervolino', 'Chuck Grodin', 'Diana Rosario, Ali Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth', 'Kelly and Tom Murro', 'Udo Spreitzenbarth', 'Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton', 'Barbara Loughlin, Dr. Gerald Loughlin, and Debbie Gelston', 'Julianne Michelle']

e = r'([A-Z][a-z]+)\s+and\s+(.*)([A-Z]\S*)'

l_out = []
for names in l:
    if re.match(e, names):
        l_out.append(re.sub(e, r'\1 \3', names))
        l_out.append(re.sub(e, r'\2\3', names))
    else:
        l_out.append(names)

print(l_out)

Output

['George Rosario, Ali Jones, Barbara Boll, and Lindsay McKelvoy', 'Jan Adelman', 'Edgar Adelman', 'Bill Mack and Les Lieberman', 'Dr. Susan Muehle-Bussel, Ray Morales, and Dr. Samuel Barker', 'Dan Barroso and Emily High', 'Cassie Sorenson', 'George Sorenson', 'Tom Scott and Mark Smith', 'The scene at IDEAL School & Academy’s 10th\xa0Annual Gala.', 'Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and Ron Iervolino', 'Chuck Grodin', 'Diana Rosario, Ali Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth', 'Kelly Murro', 'Tom Murro', 'Udo Spreitzenbarth', 'Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton', 'Barbara Loughlin, Dr. Gerald Loughlin, and Debbie Gelston', 'Julianne Michelle']


If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


Upvotes: 2

Related Questions