Reputation: 691
I have string as mentioned below:
string=" (2021-07-04 11:58:43 PM BST)
--- len (Tradition ) says to sen Hi yohan(2021-07-05 12:04:42 AM BST)
--- len (Tradition) says to yohan okay -5 / 0 .(2021-07-04 11:47:14 PM BST)
--- Ke Ch says to Hano hello(2021-07-05 12:09:41 AM BST)
--- len says to yohan sen yes -5 / 0 TN -- / +2.5
Processed by wokl Archive for son malab | 2021-07-05 12:26:44 AM
BST
---"
All I want to extract the text after says to and before timestamp.
Expected output as:
text=['yohan sen Hi yohan','yohan sen okay -5 / 0 ','Han Cho hello','sen yes -5 / 0 TN -- / +2.5']
What I have tried:
text=re.findall(r'\bsays to (.*(?:\n(?!\(\d|---).*?)*?)\s*\n(?:\(\d|---)', string)
Upvotes: 3
Views: 564
Reputation: 623
(?<=says\sto)[\s\S]*?(?=\(\d{4}-\d{2}-\d{2}\s(\d\d:){2}\d{2}\s\w{2}\s\w{3}\))
You have to use look ahead and look behind regex for this. To solve your problem, you need one look behind, which is 'says to' and one look ahead which is the date pattern.
(?<=fixed_length_regex)
(?=fixed_length_regex)
So essentially what you are looking for would look something like this:
look-behind | pattern | look-ahead
________________|_________________________|__________________
| |
(?<=(says\sto)) | match_everything_here | (?=date_pattern)
which is equivalent to first regex.
You can play around with the solution in regex101 here: https://regex101.com/r/rPFDo9/1/
Upvotes: 1
Reputation: 133458
With your shown samples, please try following Python code. Written and tested in Python3.
import re
##Create variable here string with user's values, since variable is too long so mentioning it as a comment here....
var1 = re.findall(r'says\s+[^(]*',string,re.M)
Above will create a list named var1
whose elements will have new lines at last of each element, so to remove them use following code then. Using strip
function of Python here.
var1 = list(map(lambda s: s.strip(), var1))
Now print the all elements of var1
list:
for element in var1:
print (element)
Explanation: Explanation of regex would be simple, using re.findall
function of Python3 and mentioning regex to match says\s+[^(]*
means match from says followed by space(s) just before next/1st occurrence of ( here.
Upvotes: 2
Reputation: 784998
You may use this regex:
says\s+to\s+((?:.+\n)+)
RegEx Details:
says\s+to\s+
: Matches says to
followed by 1+ whitespaces((?:.+\n)+)
: Match 1+ non-empty lines and capture in group #1Python Code:
matches = re.findall(r'says\s+to\s+((?:.+\n)+)', string)
Upvotes: 1