Reputation: 1101
I have a file having such data
D11/22/1984 D 123 q423 ooo 11/22/1987
R11/22/1985 123 q423 ooo 11/22/1987
D12/24/1986 123 q423 ooo 11/22/1987
511/27/1987 123 q423 ooo 11/22/1987
D18/29/1988 123 q423 ooo 11/22/1987
I need to pick the first occurrence of a record matching the pattern ^D(\d{2}/\d{2}/\d{4})
and then break and stop traversing through rest of the file.
For example in the data mentioned above I just want to pick the value D11/22/1984
and not D12/24/1986
or D18/29/1988
.
I want to write it in Python using the re
module.
Upvotes: 1
Views: 6564
Reputation: 2562
This regex captures only the first occurrence:
import re
filedata = '''
D11/22/1984 D 123 q423 ooo 11/22/1987
R11/22/1985 123 q423 ooo 11/22/1987
D12/24/1986 123 q423 ooo 11/22/1987
511/27/1987 123 q423 ooo 11/22/1987
D18/29/1988 123 q423 ooo 11/22/1987
'''
print(list(re.findall(r'^D(\d{2}/\d{2}/\d{4})?.*', filedata, flags=re.M|re.S)))
# ['12/24/1986']
Furthermore, re.search scans the string and returns only the first occurrence found and stops scanning (maybe this is what you are looking for):
print(re.search(r'^D(\d{2}/\d{2}/\d{4})', filedata, flags=re.M|re.S).groups())
# ('11/22/1984',)
# no need of the (...)?.* Your original pattern can be used.
With this regex, findall finds... all occurrences:
print(list(re.findall(r'^D(\d{2}/\d{2}/\d{4})', filedata, flags=re.M|re.S)))
# ['11/22/1984', '12/24/1986', '18/29/1988']
Upvotes: 1
Reputation: 142156
You can build a generator over your file-obj (the following assumes it's called f
) which applies your re.match
, then take the first occurrence of a match, eg:
matches = (re.match('D(\d{2}/\d{2}/\d{4})', line) for line in f)
first_match = next((match.group(1) for match in matches if match), None)
If you get None
, then no matches were found. You can also extend this to easily take n
many matches:
from itertools import islice, ifilter
first5 = list(islice(ifilter(None, matches), 5))
If you then get an empty list, no matches were found.
Upvotes: 3
Reputation:
You can use a function that iterates over the file object with a for-loop and the returns when it finds the first match:
import re
def func():
with open('/path/to/file.txt') as f: # Open the file (auto-close it too)
for line in f: # Go through the lines one at a time
m = re.match('D(\d{2}/\d{2}/\d{4})', line) # Check each line
if m: # If we have a match...
return m.group(1) # ...return the value
Iterating over a file object yields its lines one-by-one. So, we only check as many lines as necessary.
Also, I removed the ^
from your pattern since re.match
already matches from the start of the string by default.
If you already have a file object open, just remove the with-statement and pass the file as an argument to the function:
import re
def func(f):
for line in f: # Go through the lines one at a time
m = re.match('D(\d{2}/\d{2}/\d{4})', line) # Check each line
if m: # If we have a match...
return m.group(1) # ...return the value
Just remember to close the file when you are done with it.
Upvotes: 1
Reputation: 7948
you could consume the rest of your data like so
^D(\d{2}/\d{2}/\d{4})[\s\S]+
Upvotes: 1