Luka
Luka

Reputation: 125

Search for an specific string in a paragraph

I've an example.txt which contains hexadecimal data like this.

09 06 07 04 00 00 01 00 1d 03 4b 2c a1 2a 02 01   
b7 09 01 47 30 12 a0 0a 80 08 33 04 03 92 22 14   
07 f0 a1 0b 80 00 81 00 84 01 00 86 00 85 00 83   
07 91 94 71 06 00 07 19

09 06 07 04 r0 00 01 00 1d 03 4b 2c a1 2a 02 01   
b7 09 01 47 30 1s a0 0a 80 08 33 04 03 92 22 14   
07 f0 a1 0b 80 00 81 0d 84 01 00 86 00 85 00 83   
07 91 94 71 06 

09 06 07 04 r0 00 01 00 1d 03 4b 2c a1 2a 02 01   
b7 09 01 47 30 1s a0 0a 80 08 33 04 03 92 22 14   
07 f0 a1 0b 80 00 81 0d 84 01 00 86 00 85 00 83   
b7 09 01 47 30 1s a0 0a 80 08 33 04 03 92 22 14

b7 09 01 47 30 1s a0 0a 80 08 33 04 03 92 22 14   
07 f0 a1 0b 80 00 81 0d 84 01 00 86 00 85

What I want to do is to look for a specific string and if exits continue at that point looking for another string and so on. Besides, I want to stop looking for that sequence when I have a jump line.

I mean I have a huge text file which is divided into paragraphs so I want to search for that pattern in each paragraph and when the paragrahp is finished to start again doing the searching from the starting point.

What I've implemented is the next but I dont know how to express that the paragraph is over and to start from the beginning the search

import os
import re

file_path = 'example.txt'
pattern = re.compile("12.*(?=[90|25|30]).*(?=40).*(?=20)")  # add a proper regex here to match all you required strings properly

with open(file_path) as file:
    tokens = re.findall(pattern, file.read())

if tokens:
   os.remove(file_path)

Upvotes: 0

Views: 512

Answers (1)

M.G.Poirot
M.G.Poirot

Reputation: 1155

If I understand your question correctly, a for-loop using read_lines to create separate paragraph strings that can be queried should work as follows:

import os
import re

file_path = 'example.txt'

pattern = re.compile("12.*(?=[90|25|30]).*(?=40).*(?=20)")


with open(file_path) as file:
    # create a list of paragraphs
    paragraphs = []
    cur_paragraph = ''
    for line in file.readlines():
        if line == '\n':
            print(cur_paragraph)
            paragraphs.append(cur_paragraph.replace('\n', ' '))
            cur_paragraph = ''
        else:
            cur_paragraph += line
    
    # query each paragraph using the regex pattern
    for paragraph in paragraphs:
        tokens = re.findall(pattern, paragraph)
        if tokens:
            os.remove(file_path)
            break

Upvotes: 1

Related Questions