Reputation: 473
I have blocks of text that contain strings like the one below. I need to get the text either side of "rt" and including rt but excluding text/numbers on different lines
Example:
1.99
Jim Smith rt Tom Ross
Random
So, here the desired result would be "Jim Smith rt Tom Ross".
I am new to regex and cannot get close. I think I need to lookahead and lookbehind then bound the result in some way but I'm struggling.
Any help would be appreciated.
Upvotes: 1
Views: 326
Reputation: 133610
With your shown samples please try following regex. Here is the Online demo for above regex.
^\d+(?:\.\d+)?\n+\s+(.*?rt[^\n]+)\n+\s*\S+$
Python3 code: Code is written and tested in Python3x. Its using Python3's re
module's findall
function which also has re.M
flag enabled in it to deal with the variable value.
import re
var = """1.99
Jim Smith rt Tom Ross
Random"""
re.findall(r'^\d+(?:\.\d+)?\n+\s+(.*?rt[^\n]+)\n+\s*\S+$',var,re.M)
['Jim Smith rt Tom Ross']
Explanation of regex:
^\d+ ##From starting of the value matching 1 or more occurrences of digits.
(?:\.\d+)? ##In a non-capturing group matching literal dot followed by 1 or more digits.
\n+\s+ ##Followed by 1 or more new lines followed by 1 or more spaces.
(.*?rt[^\n]+) ##In a CAPTURING GROUP using lazy match to match till string rt just before a new line.
\n+\s*\S+$ ##Followed by new line(s), followed by 0 or more occurrences of spaces and NON-spaces at the end of this value.
Upvotes: 1
Reputation: 522064
We can use re.findall
here with an appropriate regex pattern:
inp = """1.99
Jim Smith rt Tom Ross
Random"""
matches = re.findall(r'\w+(?: \w+)* rt \w+(?: \w+)*', inp)
print(matches) # ['Jim Smith rt Tom Ross']
Explanation of regex:
\w+
match a single word(?: \w+)*
proceeded by space and another word, zero or more timesrt
match space followed by 'rt' and another space\w+
match another word(?: \w+)*
which is followed by space and another word, zero or more timesUpvotes: 1