Reputation: 477
The following regex isn't replacing substrings as expected.
I've tried running the code with the following modifications (one at a time, of course) all with no luck:
reg_pattern = r"(?!\\s)(\\W[^\\W,]+)(?!,) and\\s([^ ]+ )([^ ]+)"
sub_pattern = r"\\1 \\3 \\2\\3"
cleaned_names = []
cleaned_names = [re.sub(reg_pattern, sub_pattern, name) for name in names]
The goal can be seen in the link above (particularly in the 'substitution' section at the bottom of that page), but ultimately, I need to append group3 of the regex to the end of group1.
Upvotes: 1
Views: 139
Reputation: 27723
I'm guessing that maybe, you're trying to re.sub
the couples names, for which you can likely write some expression similar to:
([A-Z][a-z]+)\s+and\s+(.*)([A-Z]\S*)
if you are not having edge cases, if you do then, you'd probably want to modify the char classes, [A-Z]
, and add those other chars, in there.
import re
l = ['George Rosario, Ali Jones, Barbara Boll, and Lindsay McKelvoy', 'Jan and Edgar Adelman', 'Bill Mack and Les Lieberman', 'Dr. Susan Muehle-Bussel, Ray Morales, and Dr. Samuel Barker', 'Dan Barroso and Emily High', 'Cassie and George Sorenson', 'Tom Scott and Mark Smith', 'The scene at IDEAL School & Academy’s 10th\xa0Annual Gala.',
'Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and Ron Iervolino', 'Chuck Grodin', 'Diana Rosario, Ali Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth', 'Kelly and Tom Murro', 'Udo Spreitzenbarth', 'Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton', 'Barbara Loughlin, Dr. Gerald Loughlin, and Debbie Gelston', 'Julianne Michelle']
e = r'([A-Z][a-z]+)\s+and\s+(.*)([A-Z]\S*)'
l_out = []
for names in l:
if re.match(e, names):
l_out.append(re.sub(e, r'\1 \3 and \2\3', names))
else:
l_out.append(names)
print(l_out)
['George Rosario, Ali Jones, Barbara Boll, and Lindsay McKelvoy', 'Jan Adelman and Edgar Adelman', 'Bill Mack and Les Lieberman', 'Dr. Susan Muehle-Bussel, Ray Morales, and Dr. Samuel Barker', 'Dan Barroso and Emily High', 'Cassie Sorenson and George Sorenson', 'Tom Scott and Mark Smith', 'The scene at IDEAL School & Academy’s 10th\xa0Annual Gala.', 'Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and Ron Iervolino', 'Chuck Grodin', 'Diana Rosario, Ali Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth', 'Kelly Murro and Tom Murro', 'Udo Spreitzenbarth', 'Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton', 'Barbara Loughlin, Dr. Gerald Loughlin, and Debbie Gelston', 'Julianne Michelle']
Or you can try
import re
l = ['George Rosario, Ali Jones, Barbara Boll, and Lindsay McKelvoy', 'Jan and Edgar Adelman', 'Bill Mack and Les Lieberman', 'Dr. Susan Muehle-Bussel, Ray Morales, and Dr. Samuel Barker', 'Dan Barroso and Emily High', 'Cassie and George Sorenson', 'Tom Scott and Mark Smith', 'The scene at IDEAL School & Academy’s 10th\xa0Annual Gala.',
'Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and Ron Iervolino', 'Chuck Grodin', 'Diana Rosario, Ali Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth', 'Kelly and Tom Murro', 'Udo Spreitzenbarth', 'Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton', 'Barbara Loughlin, Dr. Gerald Loughlin, and Debbie Gelston', 'Julianne Michelle']
e = r'([A-Z][a-z]+)\s+and\s+(.*)([A-Z]\S*)'
l_out = []
for names in l:
if re.match(e, names):
l_out.append(re.sub(e, r'\1 \3', names))
l_out.append(re.sub(e, r'\2\3', names))
else:
l_out.append(names)
print(l_out)
['George Rosario, Ali Jones, Barbara Boll, and Lindsay McKelvoy', 'Jan Adelman', 'Edgar Adelman', 'Bill Mack and Les Lieberman', 'Dr. Susan Muehle-Bussel, Ray Morales, and Dr. Samuel Barker', 'Dan Barroso and Emily High', 'Cassie Sorenson', 'George Sorenson', 'Tom Scott and Mark Smith', 'The scene at IDEAL School & Academy’s 10th\xa0Annual Gala.', 'Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and Ron Iervolino', 'Chuck Grodin', 'Diana Rosario, Ali Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth', 'Kelly Murro', 'Tom Murro', 'Udo Spreitzenbarth', 'Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton', 'Barbara Loughlin, Dr. Gerald Loughlin, and Debbie Gelston', 'Julianne Michelle']
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Upvotes: 2