kubeq24
kubeq24

Reputation: 41

Python re.search finds result but group doesnt work

I want to find number matching my pattern inside every line in the .txt file. text fragment

sometext - 0.007442749125388171
sometext - 0.004296183916209439
sometext - 0.0037923667088698393
sometext - 0.003137404884873018

code

file = codecs.open(FILEPATH, encoding='utf-8')
for cnt, line in enumerate(file):
    result_text = re.match(r'[a-zżźćńółęąś]*', line).group()
    result_value = re.search(r'[0-9].[0-9]*', line).group()
    print("Line {}: {}".format(cnt, line))

It's strange because re.search finds results:

<_sre.SRE_Match object; span=(8, 28), match='0.001879612135574806'>

but if I want to assign result to variable I get this: error

File "read.py", line 18, in <module>
result_value = re.search(r'[0-9].[0-9]*', line).group()
AttributeError: 'NoneType' object has no attribute 'group'

Upvotes: 0

Views: 2204

Answers (2)

wp78de
wp78de

Reputation: 18950

I'd like to suggest a tighter regex definition:

^([a-zżźćńółęąś]+)\s+-\s+(\d+\.\d+)$

Demo

Explanation

  • multiline mode: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of the string)
  • ^ assert the beginning of the line
  • ([a-zżźćńółęąś]+) capture group to match the "identifier"
  • \s+-\s+ the separator in-between with a variable number of spaces
  • (\d+\.\d+) matches the decimal number
  • $ asserts the end of the line

Sample Code:

import re
regex = r"^([a-zżźćńółęąś]+)\s+-\s+(\d+\.\d+)$"
test_str = ("sometext - 0.007442749125388171\n"
    "sometext - 0.004296183916209439\n"
    "sometext - 0.0037923667088698393\n"
    "sometext - 0.003137404884873018")

matches = re.finditer(regex, test_str, re.MULTILINE)
for match in matches:
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1
        print ("Group {groupNum}: {group}".format(groupNum = groupNum, group = match.group(groupNum)))

Upvotes: 1

Pierre
Pierre

Reputation: 1099

When capturing a group in a regular expression, you need to put parentheses around the group that you aim to capture. Also, you need to pass the index of the group you want to capture to the group() method.

For example, for your second match, the code should be modified as follows:

# There is only 1 group here, so we pass index 1
result_value = re.search(r'([0-9].[0-9]*)', line).group(1)

As proposed by other comments in your question, you may also want to check whether matches were found before trying to extract the captured groups:

import re

with open("file.txt") as text_file:
    for i, line in enumerate(text_file):
        text_matches = re.match(r'([a-zżźćńółęąś]*)', line)
        if text_matches is None:
            continue

        text_result = text_matches.group(1)

        value_matches = re.search(r'([0-9].[0-9]*)', line)
        if value_matches is None:
            continue

        value_result = value_matches.group(1)

        print("Line {}: {}".format(text_result, value_result))

Upvotes: 2

Related Questions