Reputation: 177
Input String:
The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, -8, -97, -2ALPHA, and -3 was specific antagonist. The LL-37, -15, SAC-1, -7, and -21 in keratinocytes was good.
Output String:
The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, interskin (IS)-8, interskin (IS)-97, interskin (IS)-2ALPHA, and -3 was specific antagonist. The LL-37, LL-15, SAC-1, SAC-7 and SAC-21 in keratinocytes was good.
Expected Output is:
The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, interskin (IS)-8, interskin (IS)-97, interskin (IS)-2ALPHA and interskin (IS)-3 was specific antagonist. The LL-37, LL-15, SAC-1, SAC-7 and SAC-21 in keratinocytes was good.
I am not getting the interskin (IS)-3 part in my output string. Please look into my code and suggest the solution.
import re
string_a = "The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, -8, -97, -2ALPHA, and -3 was specific antagonist. The LL-37, -15, SAC-1, -7, and -21 in keratinocytes was good."
print(string_a)
pattern = re.compile(r"\b([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+)(\s*-\d+[A-Z]+(?:,*\s*-\d+)*|\s*-\d+(?:,*\s*-\d+)*)(?:,*\s*and\s+(-\d+))?")
print('\n')
print(pattern.sub(lambda x: x.group(1) + f', {x.group(1)}'.join(map(str.strip, x.group(2).strip().split(','))) + (f' and {x.group(1)}{x.group(3)}' if x.group(3) else ''), string_a))
Upvotes: 1
Views: 60
Reputation: 163277
Using your pattern and code, you can add matching optional uppercase chars [A-Z]*
at the end of the group in the second alternation.
\b([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+)(\s*-\d+[A-Z]+(?:,*\s*-\d+[A-Z]*)*|\s*-\d+(?:,*\s*-\d+)*)(?:,*\s*and\s+(-\d+))?
^^^^^^
Example
import re
string_a = "The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, -8, -97, -2ALPHA, and -3 was specific antagonist. The LL-37, -15, SAC-1, -7, and -21 in keratinocytes was good."
pattern = re.compile(r"\b([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+)(\s*-\d+[A-Z]+(?:,*\s*-\d+[A-Z]*)*|\s*-\d+(?:,*\s*-\d+)*)(?:,*\s*and\s+(-\d+))?")
print(pattern.sub(lambda x: x.group(1) + f', {x.group(1)}'.join(map(str.strip, x.group(2).strip().split(','))) + (f' and {x.group(1)}{x.group(3)}' if x.group(3) else ''), string_a))
Output
The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, interskin (IS)-8, interskin (IS)-97, interskin (IS)-2ALPHA and interskin (IS)-3 was specific antagonist. The LL-37, LL-15, SAC-1, SAC-7 and SAC-21 in keratinocytes was good.
Upvotes: 2