Reputation: 177
I need to add the previous word to -number which had occurred before -number of the sentence. Please go through the input string and expected output string for more clarification. I have tried the .replace
, .sub
methods of regex with static way which is kind of manipulated output.
Input String:
The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes.
Expected Output String:
The acnes stimulated the mRNA expression of interleukin (IL)-1, interleukin (IL)-8, LL-37, MMP-1, MMP-2, MMP-3, MMP-9, and MMP-13 in keratinocytes.
Code:
import re
string_a = "The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes."
regex1 = re.findall(r"[a-z]+\s+\(+[A-Z]+\)+-\d+\,\s+-\d\,+", string_a)
regex2 = re.findall(r"[A-Z]+-\d+\,\s+-\d\,\s+-\d\,\s+-\d\,\s+[a-z]+\s+-\d+", string_a)
Upvotes: 3
Views: 73
Reputation: 627292
You can use
import re
string_a = "The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes."
pattern = re.compile(r"\b([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+)(\s*-\d+(?:,\s*-\d+)*)(?:,\s*and\s+(-\d+))?")
print( pattern.sub(lambda x: x.group(1) + f', {x.group(1)}'.join(map(str.strip, x.group(2).strip().split(','))) + (f', and {x.group(1)}{x.group(3)}' if x.group(3) else ''), string_a) )
# => The acnes stimulated the mRNA expression of interleukin (IL)-1, interleukin (IL)-8, LL-37, MMP-1, MMP-2, MMP-3, MMP-9, and MMP-13 in keratinocytes.
See the Python demo and a regex demo.
Details
\b
- word boundary([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+)
- Capturing group 1: one or more ASCII letters, then zero or more whitespaces, (
, one or more uppercase ASCII letters, and a )
, OR one or more uppercase ASCII letters(\s*-\d+(?:,\s*-\d+)*)
- Capturing group 2: zero or more whitespaces, -
, one or more digits, and then zero or more sequences of a comma, zero or more whitespaces, -
and one or more digits(?:,\s*and\s+(-\d+))?
- an optional non-capturing group: a comma, zero or more whitespaces, and
, one or more whitespaces, then a Capturing group 3: -
, one or more digits.The Group 1 value is prepended to all Group 2 comma-separated numbers inside a lambda used as a replacement argument.
If Group 3 matched, and
+space+concatenated Group 1 and Group 3 values are appended.
Upvotes: 2