Tesla001
Tesla001

Reputation: 551

How to use regex to copy section from file?

I have a file with the following structure:

******
Block 1
text
text
...
End 
******
Block 2
text
text
...
End 
******
Block 3
text
text
...
End 
******

and so on. I want to open the file read each line and save the information of the first block in a string. This is what I have so far.

Block = ''
with open(File) as file:
        for line in file:
            if re.match('\.Block.*', line):
                Block += line
            if 'str' in line:
                break
    print (Block)

However, when I print Block I am getting:

Block 1
Block 2
...

How can I use my regex to copy the lines from Block 1 to End? Thank you

Upvotes: 0

Views: 167

Answers (3)

kantal
kantal

Reputation: 2407

with open(File) as ff:
        txt=ff.read() # reading the whole file in

re.findall(r"(?ms)^\s*Block\s*\d+.*?^\s*End\s*$",txt)

 Out: 
        ['Block 1\ntext\ntext\n...\nEnd ',
         'Block 2\ntext\ntext\n...\nEnd ',
         'Block 3\ntext\ntext\n...\nEnd ']

        Or change '\d+' to '1' to get the 1st one. 
        (?ms): m: multiline mode, that we can apply ^ and $ in each line,
               s: '.' matches newline,too.
        ?: non-greedy mode in '.*?'

Upvotes: 0

Peter L.
Peter L.

Reputation: 1

You're only matching on lines that match the regex expression '.Block.*'. If you want to assign the values from each block, you'll have to do a little bit more work.

Block = ''
Match = False
with open(File) as file:
        for line in file:
            if re.match('^End$', line):
                Match = False
            if re.match('\.Block.*', line) or Match:
                Match = True
                Block += line
            if 'str' in line:
                break
    print (Block)

Upvotes: 0

Ajax1234
Ajax1234

Reputation: 71471

You can use itertools.groupby:

import itertools, re
lines = [i.strip('\n') for i in open('filename.txt')]
first_result, *_ = [list(b) for a, b in itertools.groupby(lines, key=lambda x:bool(re.findall('^\*+$', x))) if not a]
print(first_result)

Output:

['Block 1', 'text', 'text', '...', 'End ']

Upvotes: 1

Related Questions