Reputation: 41
I want to find number matching my pattern inside every line in the .txt file. text fragment
sometext - 0.007442749125388171
sometext - 0.004296183916209439
sometext - 0.0037923667088698393
sometext - 0.003137404884873018
code
file = codecs.open(FILEPATH, encoding='utf-8')
for cnt, line in enumerate(file):
result_text = re.match(r'[a-zżźćńółęąś]*', line).group()
result_value = re.search(r'[0-9].[0-9]*', line).group()
print("Line {}: {}".format(cnt, line))
It's strange because re.search finds results:
<_sre.SRE_Match object; span=(8, 28), match='0.001879612135574806'>
but if I want to assign result to variable I get this: error
File "read.py", line 18, in <module>
result_value = re.search(r'[0-9].[0-9]*', line).group()
AttributeError: 'NoneType' object has no attribute 'group'
Upvotes: 0
Views: 2204
Reputation: 18950
I'd like to suggest a tighter regex definition:
^([a-zżźćńółęąś]+)\s+-\s+(\d+\.\d+)$
Explanation
^
assert the beginning of the line \s+-\s+
the separator in-between with a variable number of spaces(\d+\.\d+)
matches the decimal number$
asserts the end of the lineimport re
regex = r"^([a-zżźćńółęąś]+)\s+-\s+(\d+\.\d+)$"
test_str = ("sometext - 0.007442749125388171\n"
"sometext - 0.004296183916209439\n"
"sometext - 0.0037923667088698393\n"
"sometext - 0.003137404884873018")
matches = re.finditer(regex, test_str, re.MULTILINE)
for match in matches:
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum}: {group}".format(groupNum = groupNum, group = match.group(groupNum)))
Upvotes: 1
Reputation: 1099
When capturing a group in a regular expression, you need to put parentheses around the group that you aim to capture. Also, you need to pass the index of the group you want to capture to the group()
method.
For example, for your second match, the code should be modified as follows:
# There is only 1 group here, so we pass index 1
result_value = re.search(r'([0-9].[0-9]*)', line).group(1)
As proposed by other comments in your question, you may also want to check whether matches were found before trying to extract the captured groups:
import re
with open("file.txt") as text_file:
for i, line in enumerate(text_file):
text_matches = re.match(r'([a-zżźćńółęąś]*)', line)
if text_matches is None:
continue
text_result = text_matches.group(1)
value_matches = re.search(r'([0-9].[0-9]*)', line)
if value_matches is None:
continue
value_result = value_matches.group(1)
print("Line {}: {}".format(text_result, value_result))
Upvotes: 2