Reputation: 125
I've an example.txt which contains hexadecimal data like this.
09 06 07 04 00 00 01 00 1d 03 4b 2c a1 2a 02 01
b7 09 01 47 30 12 a0 0a 80 08 33 04 03 92 22 14
07 f0 a1 0b 80 00 81 00 84 01 00 86 00 85 00 83
07 91 94 71 06 00 07 19
09 06 07 04 r0 00 01 00 1d 03 4b 2c a1 2a 02 01
b7 09 01 47 30 1s a0 0a 80 08 33 04 03 92 22 14
07 f0 a1 0b 80 00 81 0d 84 01 00 86 00 85 00 83
07 91 94 71 06
09 06 07 04 r0 00 01 00 1d 03 4b 2c a1 2a 02 01
b7 09 01 47 30 1s a0 0a 80 08 33 04 03 92 22 14
07 f0 a1 0b 80 00 81 0d 84 01 00 86 00 85 00 83
b7 09 01 47 30 1s a0 0a 80 08 33 04 03 92 22 14
b7 09 01 47 30 1s a0 0a 80 08 33 04 03 92 22 14
07 f0 a1 0b 80 00 81 0d 84 01 00 86 00 85
What I want to do is to look for a specific string and if exits continue at that point looking for another string and so on. Besides, I want to stop looking for that sequence when I have a jump line.
I mean I have a huge text file which is divided into paragraphs so I want to search for that pattern in each paragraph and when the paragrahp is finished to start again doing the searching from the starting point.
What I've implemented is the next but I dont know how to express that the paragraph is over and to start from the beginning the search
import os
import re
file_path = 'example.txt'
pattern = re.compile("12.*(?=[90|25|30]).*(?=40).*(?=20)") # add a proper regex here to match all you required strings properly
with open(file_path) as file:
tokens = re.findall(pattern, file.read())
if tokens:
os.remove(file_path)
Upvotes: 0
Views: 512
Reputation: 1155
If I understand your question correctly, a for-loop using read_lines
to create separate paragraph
strings that can be queried should work as follows:
import os
import re
file_path = 'example.txt'
pattern = re.compile("12.*(?=[90|25|30]).*(?=40).*(?=20)")
with open(file_path) as file:
# create a list of paragraphs
paragraphs = []
cur_paragraph = ''
for line in file.readlines():
if line == '\n':
print(cur_paragraph)
paragraphs.append(cur_paragraph.replace('\n', ' '))
cur_paragraph = ''
else:
cur_paragraph += line
# query each paragraph using the regex pattern
for paragraph in paragraphs:
tokens = re.findall(pattern, paragraph)
if tokens:
os.remove(file_path)
break
Upvotes: 1