Sachin Sinkar
Sachin Sinkar

Reputation: 177

Regex string pattern insert operation

Input String:

The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, -8, -97, -2ALPHA, and -3 was specific antagonist. The LL-37, -15, SAC-1, -7, and -21 in keratinocytes was good.

Output String:

The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, interskin (IS)-8, interskin (IS)-97, interskin (IS)-2ALPHA, and -3 was specific antagonist. The LL-37, LL-15, SAC-1, SAC-7 and SAC-21 in keratinocytes was good.

Expected Output is:

The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, interskin (IS)-8, interskin (IS)-97, interskin (IS)-2ALPHA and interskin (IS)-3 was specific antagonist. The LL-37, LL-15, SAC-1, SAC-7 and SAC-21 in keratinocytes was good.

I am not getting the interskin (IS)-3 part in my output string. Please look into my code and suggest the solution.

import re
string_a = "The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, -8, -97, -2ALPHA, and -3 was specific antagonist. The LL-37, -15, SAC-1, -7, and -21 in keratinocytes was good."
print(string_a)
pattern = re.compile(r"\b([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+)(\s*-\d+[A-Z]+(?:,*\s*-\d+)*|\s*-\d+(?:,*\s*-\d+)*)(?:,*\s*and\s+(-\d+))?")
print('\n')
print(pattern.sub(lambda x: x.group(1) + f', {x.group(1)}'.join(map(str.strip, x.group(2).strip().split(','))) + (f' and {x.group(1)}{x.group(3)}' if x.group(3) else ''), string_a))

Upvotes: 1

Views: 60

Answers (1)

The fourth bird
The fourth bird

Reputation: 163277

Using your pattern and code, you can add matching optional uppercase chars [A-Z]* at the end of the group in the second alternation.

\b([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+)(\s*-\d+[A-Z]+(?:,*\s*-\d+[A-Z]*)*|\s*-\d+(?:,*\s*-\d+)*)(?:,*\s*and\s+(-\d+))?
                                                           ^^^^^^

Regex demo

Example

import re
string_a = "The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, -8, -97, -2ALPHA, and -3 was specific antagonist. The LL-37, -15, SAC-1, -7, and -21 in keratinocytes was good."
pattern = re.compile(r"\b([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+)(\s*-\d+[A-Z]+(?:,*\s*-\d+[A-Z]*)*|\s*-\d+(?:,*\s*-\d+)*)(?:,*\s*and\s+(-\d+))?")
print(pattern.sub(lambda x: x.group(1) + f', {x.group(1)}'.join(map(str.strip, x.group(2).strip().split(','))) + (f' and {x.group(1)}{x.group(3)}' if x.group(3) else ''), string_a))

Output

The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, interskin (IS)-8, interskin (IS)-97, interskin (IS)-2ALPHA and interskin (IS)-3 was specific antagonist. The LL-37, LL-15, SAC-1, SAC-7 and SAC-21 in keratinocytes was good.

Upvotes: 2

Related Questions