Ussabin
Ussabin

Reputation: 75

Python RE, AttributeError: 'tuple' object has no attribute 'group'

I'm trying to use Python 2.7 regex's to retrieve data from sample web pages that have been provided in a course I'm taking. The code I'm trying to get to work is:

email_patterns = ['(?P<lname>[\w+\.]*\w+ *)@(?P<domain> *\w+[\.\w+]*).(?P<tld>com)

for pattern in email_patterns:
        # 'line' is a line of text in a sample web page
        matches = re.findall(pattern,line)
        for m in matches:
            print 'matches=', m
            email = '{}@{}.{}'.format(m.group('lname'), m.group('domain'),m.group('tld')) 

Running this returns the following error:

email = '{}@{}.{}'.format(m.group('lname'), m.group('domain'), m.group('tld'))
AttributeError: 'tuple' object has no attribute 'group'.

I want to use named groups because the sequence of the groups can change depending on the text I'm matching. However, it doesn't appear to work because the compiler doesn't think that 'm' is a Group object.

What's going on here, and how can I get this to work properly by using named groups?

Upvotes: 1

Views: 3611

Answers (1)

user849425
user849425

Reputation:

You have two problems. Like Ignacio hinted, you shouldn't be parsing (X)HTML with regex... regular expressions are not able to handle the complexity. The other problem is that you're using findall() instead of finditer(). findall() returns the matches as a list... in the event of groups, it returns it as a list of tuples.

finditer() on the otherhand returns an iterator of MatchGroup objects that has a group() method.

From the python documentation for re:

re.findall(pattern, string, flags=0) Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

re.finditer(pattern, string, flags=0) Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

Upvotes: 2

Related Questions