adiletgov
adiletgov

Reputation: 51

How to use regex to delete specific pattern in lines in python string?

I have a string in the following format.

Test 2 Lorem ipsum dolor sit amet consectetur adipisicing elit.
Test 3 Lorem ipsum dolor sit amet consectetur adipisicing elit.
Test 4 Lorem ipsum dolor sit amet consectetur adipisicing elit.
Test 5 Lorem ipsum dolor sit amet consectetur adipisicing elit.

How to delete Test 2, Test 3 and so on, so the string would look like this?

Lorem ipsum dolor sit amet consectetur adipisicing elit.
Lorem ipsum dolor sit amet consectetur adipisicing elit.
Lorem ipsum dolor sit amet consectetur adipisicing elit.
Lorem ipsum dolor sit amet consectetur adipisicing elit.
Lorem ipsum dolor sit amet consectetur adipisicing elit.

I have tried:

test1 = re.compile(r'^Test \d ')
test2 = re.compile(r'^Test \d\d ')
text = re.sub(test1, '', text)
text = re.sub(test2, '', text)

But it didn't work

Upvotes: 3

Views: 109

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133428

Based on your shown samples, please try following. This will work even if you are having 1 or more occurrences of Test digit from starting of your value.

import re
var="""Test 2 Lorem ipsum dolor sit amet consectetur adipisicing elit.
Test 3 Lorem ipsum dolor sit amet consectetur adipisicing elit.
Test 4 Lorem ipsum dolor sit amet consectetur adipisicing elit.
Test 5 Lorem ipsum dolor sit amet consectetur adipisicing elit."""
print (re.sub(r'^(Test\s+\d+)(\s+Test\s+\d+)*\s*', '', var, flags=re.M))

Explanation: Using Python's re library here. Then using re.sub function of Python. Giving regex inside it to substitute matched value with NULL in var(variable).

Explanation of regex:

^(Test\s+\d+)       ##From starting of value, matching Test followed by 1 or more spaces followed by 1 or more digits.
(\s+Test\s+\d+)*    ##Matching 1 or more spaces followed by Test, followed by 1 or more spaces, followed by 1 or more occurrences of digits. matching 0 or more occurrences of this regex.
\s*                 ##Matching 0 or more occurrences of spaces here.

Upvotes: 3

mkrieger1
mkrieger1

Reputation: 23142

Assuming that you have a single multi-line string, then

test1 = re.compile(r'^Test \d ')
text = re.sub(test1, '', text)

does in fact remove Test 2 from the first line of the string, but does not change all other lines, because ^ matches the beginning of the whole string, and not the beginning of each line.

You can change that by using the re.M flag:

When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line

>>> test1 = re.compile(r'^Test \d ', flags=re.M)
>>> text = '''\
... Test 2 Lorem ipsum dolor sit amet consectetur adipisicing elit.
... Test 3 Lorem ipsum dolor sit amet consectetur adipisicing elit.
... Test 4 Lorem ipsum dolor sit amet consectetur adipisicing elit.
... Test 5 Lorem ipsum dolor sit amet consectetur adipisicing elit.
... '''
>>> print(re.sub(test1, '', text))
Lorem ipsum dolor sit amet consectetur adipisicing elit.
Lorem ipsum dolor sit amet consectetur adipisicing elit.
Lorem ipsum dolor sit amet consectetur adipisicing elit.
Lorem ipsum dolor sit amet consectetur adipisicing elit.

Alternatively, split the string in into lines and apply your original pattern without re.M to each line separately:

>>> test1 = re.compile(r'^Test \d ')
>>> [re.sub(test1, '', line) for line in text.splitlines()]
['Lorem ipsum dolor sit amet consectetur adipisicing elit.',
 'Lorem ipsum dolor sit amet consectetur adipisicing elit.',
 'Lorem ipsum dolor sit amet consectetur adipisicing elit.',
 'Lorem ipsum dolor sit amet consectetur adipisicing elit.']

Depending on whether you want to continue processing the text as a whole, or each line separately (or maybe you already have each line separately as input to your program), one or the other option may be more practical.

The test1 pattern works only for single-digit numbers after 'Test ' and the test2 pattern works only for two-digit numbers. To make it work for any number of digits, change \d or \d\d to \d+.

Upvotes: 3

Related Questions