ajai biltu
ajai biltu

Reputation: 55

Regex to read a file and return the first line after the matched pattern from inside the file in Python

Example string 1:

7.2.P.8.1 

Summary and Conclusion  


A stability study with two batches was carried out.

Example string 2:

7.2.S.1.2  

Structure 

Not applicable as the substance is not present.

I want to write a regex to fetch the first line after this form (7.2.P.8.1 ) or (7.2.S.1.2 ) or (8-3-1-P-2) or any other format(either everything will be separated by . or -) and retrieve it. So from the first intance I need as output (Summary and Conclusion) and from the second instance (Structure). The word 'Example String' wont be part of the file content and is just given to show an example.

Maybe occasionally the format will be like:

9.2.P.8.1 Summary and Conclusion  

A stability study with two batches was carried out. 

In this case also, I want to retrieve as output : Summary and Conclusion

Note: I only want to retrieve the first matching pattern from the file and not all matches, so my code should break after finding the first matching pattern. How can I do this efficiently.

Code till now:

import re
def func():
    with open('/path/to/file.txt') as f: # Open the file (auto-close it too)
        for line in f: # Go through the lines one at a time
            m = re.match('\d+(?:[.-]\w+)*\s*', line) # Check each line
            if m: # If we have a match...
                return m.group(1) # ...return the value

Upvotes: 1

Views: 1040

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

You may use

import re

rx = re.compile(r'\d+(?:[.-]\w+)*\s*(\S.*)?$')
found = False
with open('/path/to/file.txt', 'r') as f:
    for line in f:
        if not found:                         # If the required line is not found yet
            m = rx.match(line.strip())        # Check if matching line found
            if m:                               
                if m.group(1):                # If Group 1 is not empty 
                    print(m.group(1))         # Print it
                    break                     # Stop processing
                else:                         # Else, the next blank line is necessary
                    found=True                # Set found flag to True
        else:
            if not line.strip():              # Skip blank line
                pass
            else:
                print(line.strip())           # Else, print the match
                break                         # Stop processing

See the Python demo and the regex demo.

NOTES

The \d+(?:[.-]\w+)*\s*(\S.*)?$ regex searches for 1+ digits and then 0 or more repetitions of . or - followed with 1+ word chars, and then tries to match 0+ whitespaces and then capture into Group 1 any non-whitespace char followed with any 0+ chars up to the line end. If Group 1 is not empty, the match is found and break stops processing.

Else, the found boolean flag is set to True and the next non-blank line is returned.

Upvotes: 2

Related Questions