krizz
krizz

Reputation: 125

Finding a string matching a given pattern and separating lines using Python's re module

In a random string I need to find a string matching a given pattern, and put ; after this string. I think I should use re to do it, but I am not that familiar with it.

Example input:

this is the first part of string 1/32 part this is the second part of string

as a result, I need to put ; after the 1/32 part, e.g

this is the first part of string 1/32 part; this is the second part of string

I know I should use re, and I know I should probably use re.match with a pattern looking like [1-1000]/[1-1000]\spart but I'm not sure where to go from here.

Edit: 1/32 is an example, it can be 65/123, 1/3, 6/7

Upvotes: 1

Views: 105

Answers (2)

Wolf
Wolf

Reputation: 10238

Your use case is called substitution. This is exactly what the re.sub function is for.

import re

s = "bla 1/6 part bla bla 76/88 part 12345/12345 part bla"
print(s)
s = re.sub(r'(\b\d{1,4}/\d{1,4} part)', r'\1;', s)
print(s)

The output of this is

bla 1/6 part; bla bla 76/88 part; 12345/12345 part bla

Note the missing ; after the last occurrence of part.

I used {} quantifiers to limit numerator and denominator of the fractions to 4 decimal digits, which is something that you mentioned by you [1-1000] notation. It could be even better approximated by 1?\d{1,3} (but this is also not exact the same, it also allows for example 1999/1999)[1].


[1] p.s. As tripleee commented, the exact regular expression for decimal numbers ranging from 1 to 1000 is [1-9]([0-9][0-9]?)?|1000, it looks a bit complicated, but the building pattern becomes obvious if you separate the only 4-digit number 1000 and use a superfluous pair of parentheses on the 1- to 3-digit part: [1-9]([0-9]([0-9])?)?. Another option is to use the character class shortcut \d for [0-9], resulting in [1-9]\d{0,2}|1000.

Edit:

  • Combined the match grouping.
  • Added the anchor before the numerator.

Upvotes: 4

user6165050
user6165050

Reputation:

You just have to use re.match and re.sub from the re module, along with the below regex

import re

my_str = 'this is the first part of string 1/32 part this is the second part of string'
my_regex = r'(\d+/\d+\s+part)'

if re.match(my_regex, my_str):
    print(re.sub(my_regex, r'\1,', my_str))  # this will print: 1/32 part,
    # ...

Bare with the fact that you need to add some extra flags to the regex if you need multiple lines to match the same regex. See here a list of such flags.

You can see the regex here


A quick replacement (there might be better ways) would be to also match the parts before and after the desired matching part and do something like:

import re

my_str = 'this is the first part of string 1/32 part this is the second part of string'
my_regex = r'(.*)(\s+\d+/\d+\s+part)(.*)'

condition = re.match(my_regex, my_str)

if condition:
    part = re.sub(my_regex, r'\2,', my_str)

x = condition.group(1) + part + condition.group(3)
print(x)

Which will output the modified string:

this is the first part of string 1/32 part, this is the second part of string

A simple one-line function with all of the above would be:

import re


def modify_string(my_str, my_regex):
    return re.sub(my_regex, r'\1,', my_str)

if __name__ == '__main__':
    print(modify_string('first part of string 1/32 part second part of string', r'(\d+/\d+\s+part)'))

But I'd recommend keeping the condition. Just in case.

Upvotes: 4

Related Questions