Sachin Sinkar
Sachin Sinkar

Reputation: 177

Replace the value with the previous occurrence of acronym using python regular expression module

I need to add the previous word to -number which had occurred before -number of the sentence. Please go through the input string and expected output string for more clarification. I have tried the .replace, .sub methods of regex with static way which is kind of manipulated output.

Input String:

The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes.

Expected Output String:

The acnes stimulated the mRNA expression of interleukin (IL)-1, interleukin (IL)-8, LL-37, MMP-1, MMP-2, MMP-3, MMP-9, and MMP-13 in keratinocytes.

Code:

import re
string_a = "The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes."
regex1 = re.findall(r"[a-z]+\s+\(+[A-Z]+\)+-\d+\,\s+-\d\,+", string_a)
regex2 = re.findall(r"[A-Z]+-\d+\,\s+-\d\,\s+-\d\,\s+-\d\,\s+[a-z]+\s+-\d+", string_a)

Upvotes: 3

Views: 73

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627292

You can use

import re
string_a = "The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes."
pattern = re.compile(r"\b([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+)(\s*-\d+(?:,\s*-\d+)*)(?:,\s*and\s+(-\d+))?")
print( pattern.sub(lambda x: x.group(1) + f', {x.group(1)}'.join(map(str.strip, x.group(2).strip().split(','))) + (f', and {x.group(1)}{x.group(3)}' if x.group(3) else ''), string_a) )
# => The acnes stimulated the mRNA expression of interleukin (IL)-1, interleukin (IL)-8, LL-37, MMP-1, MMP-2, MMP-3, MMP-9, and MMP-13 in keratinocytes.

See the Python demo and a regex demo.

Details

  • \b - word boundary
  • ([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+) - Capturing group 1: one or more ASCII letters, then zero or more whitespaces, (, one or more uppercase ASCII letters, and a ), OR one or more uppercase ASCII letters
  • (\s*-\d+(?:,\s*-\d+)*) - Capturing group 2: zero or more whitespaces, -, one or more digits, and then zero or more sequences of a comma, zero or more whitespaces, - and one or more digits
  • (?:,\s*and\s+(-\d+))? - an optional non-capturing group: a comma, zero or more whitespaces, and, one or more whitespaces, then a Capturing group 3: -, one or more digits.

The Group 1 value is prepended to all Group 2 comma-separated numbers inside a lambda used as a replacement argument.

If Group 3 matched, and+space+concatenated Group 1 and Group 3 values are appended.

Upvotes: 2

Related Questions