Reputation: 123
I've got a file that looks like this:
foo: 11.00 12.00 bar 13.00
bar: 11.00 12.00 bar
foo: 11.00 12.00
and would like to extract all numbers in lines beginning with the keyword "foo:". Expected result:
['11.00', '12.00', '13.00']
['11.00', '12.00']
Now, this is easy, if I use two regexes, like this:
if re.match('^foo:', line):
re.findall('\d+\.\d+', line)
but I was wondering, if it is possible to combine these into a single regex?
Thanks for your help, MD
Upvotes: 4
Views: 966
Reputation: 1237
Not exactly what you asked for, but since it's recommended to use standard Python tools instead of regexes where possible, I'd do something like this:
import re
with open('numbers.txt', 'r') as f:
[re.findall(r'\d+\.\d+', line) for line in f if line.startswith('foo')]
UPDATE
And this will return the numbers after 'foo' even if it's anywhere in the string rather than just in the beginning:
with open('numbers.txt', 'r') as f:
[re.findall(r'\d+\.\d+', line.partition('foo')[2]) for line in f]
Upvotes: 4
Reputation: 3107
You can do without the first regexp and instead filter lines in a list comprehension by comparing the first four characters of the line, and compile the inner regexp:
import re
with open("input.txt", "r") as inp:
prog=re.compile("\d+\.\d+")
results=[prog.findall(line) for line in inp if line[:4]=="foo:"]
Upvotes: 0
Reputation: 409166
If all lines in the file always have the same number of numbers, you can use the following regex:
"^foo:[^\d]*(\d*\.\d*)[^\d]*(\d*\.\d*)[^\d]*(\d*\.\d*)"
Example:
>>> import re
>>> line = "foo: 11.00 12.00 bar 13.00"
>>> re.match("^foo:[^\d]*(\d*\.\d*)[^\d]*(\d*\.\d*)[^\d]*(\d*\.\d*)", line).groups()
('11.00', '12.00', '13.00')
>>>
Using parentheses around a part of the regular expression makes it into a group that can be extracted from the match object. See the Python documentation for more information.
Upvotes: 0