Replace the value with the previous occurrence of acronym using python regular expression module

Question

I need to add the previous word to -number which had occurred before -number of the sentence. Please go through the input string and expected output string for more clarification. I have tried the .replace, .sub methods of regex with static way which is kind of manipulated output.

Input String:

The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes.

Expected Output String:

The acnes stimulated the mRNA expression of interleukin (IL)-1, interleukin (IL)-8, LL-37, MMP-1, MMP-2, MMP-3, MMP-9, and MMP-13 in keratinocytes.

Code:

import re
string_a = "The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes."
regex1 = re.findall(r"[a-z]+\s+$+[A-Z]+$+-\d+\,\s+-\d\,+", string_a)
regex2 = re.findall(r"[A-Z]+-\d+\,\s+-\d\,\s+-\d\,\s+-\d\,\s+[a-z]+\s+-\d+", string_a)

Wiktor Stribiżew · Accepted Answer

You can use

import re
string_a = "The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes."
pattern = re.compile(r"\b([A-Za-z]+\s*$[A-Z]+$|[A-Z]+)(\s*-\d+(?:,\s*-\d+)*)(?:,\s*and\s+(-\d+))?")
print( pattern.sub(lambda x: x.group(1) + f', {x.group(1)}'.join(map(str.strip, x.group(2).strip().split(','))) + (f', and {x.group(1)}{x.group(3)}' if x.group(3) else ''), string_a) )
# => The acnes stimulated the mRNA expression of interleukin (IL)-1, interleukin (IL)-8, LL-37, MMP-1, MMP-2, MMP-3, MMP-9, and MMP-13 in keratinocytes.

See the Python demo and a regex demo.

Details

\b - word boundary
([A-Za-z]+\s*$[A-Z]+$|[A-Z]+) - Capturing group 1: one or more ASCII letters, then zero or more whitespaces, (, one or more uppercase ASCII letters, and a ), OR one or more uppercase ASCII letters
(\s*-\d+(?:,\s*-\d+)*) - Capturing group 2: zero or more whitespaces, -, one or more digits, and then zero or more sequences of a comma, zero or more whitespaces, - and one or more digits
(?:,\s*and\s+(-\d+))? - an optional non-capturing group: a comma, zero or more whitespaces, and, one or more whitespaces, then a Capturing group 3: -, one or more digits.

The Group 1 value is prepended to all Group 2 comma-separated numbers inside a lambda used as a replacement argument.

If Group 3 matched, and+space+concatenated Group 1 and Group 3 values are appended.

Replace the value with the previous occurrence of acronym using python regular expression module

Answers (1)

Related Questions