Reputation: 33
The below is the sample substring present in a much larger string (detaildesc_final) that I have obtained. I need to use a regex search across the string so that I can retrieve all the lines that begin with " [] " (The two square brackets I mean) from the [Data] Section. All lines should be retrieved in the [Data] section until the [Logs] line is encountered.
[Data]
[] some text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[Logs]
I'm using Python to work the code and I've used the following command (which clearly is incorrect).
re.findall(r'\b\\[\\]\w*', detaildesc_final)
I need the result to be in the following format:
some text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
I have already looked a lot online and I could figure out to find any line starting with a single double character instead of two ( [] in this case). Any help would be greatly appreciated. Thank you.
Upvotes: 0
Views: 183
Reputation: 12679
You need positive look behind :
import re
pattern=r'(?<=\[\])(.\w.+)'
string_1="""[Data]
[] some text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[Logs]"""
match=re.finditer(pattern,string_1,re.M)
for item in match:
print(item.group(1))
output:
some text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
Regex explanation :
Positive Lookbehind (?<=\[\])
It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there.
\[
matches the character [
literally (case sensitive) \]
matches the character ]
literally (case sensitive).
matches any character (except for line terminators) \w
matches any word character (equal to [a-zA-Z0-9_]
)+
Quantifier —
Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)Upvotes: 1
Reputation: 3547
import re
re.findall(r'\[\] (.*)\n\n', detaildesc_final)
Output:
['some text',
'some_other_text',
'some_other_text',
'some_other_text',
'some_other_text',
'some_other_text',
'some_other_text',
'some_other_text',
'some_other_text',
'some_other_text',
'some_other_text',
'some_other_text']
Upvotes: 1
Reputation: 2055
import re
str = """
[Data]
[] some text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[] some_other_text
[Logs]
"""
print re.sub("([[a-zA-Z ]{0,}][ ]?)", '',str)
output:
some text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
some_other_text
Upvotes: 1
Reputation: 5249
Don't over-complicate things.
for line in detaildesc_final.split('\n'):
if line.startswith('[]'):
do_something()
Upvotes: 1