Reputation: 1400
I've been putting together a list of pages that we need to update with new content (we're switching media formats). In the process I'm cataloging pages that correctly have the new content.
Here's the general idea of what I'm doing:
Everything works fine up until the 3rd regex pattern match, where I get the following:
'NoneType' object has no attribute 'group'
# only interested in embeded content
pattern = "(<embed .*?</embed>)"
# matches content pointing to our old root
pattern2 = 'data="(http://.*?/media/.*?")'
# matches content pointing to our new root
pattern3 = 'data="(http://.*?/content/.*?")'
matches = re.findall(pattern, filebuffer)
for match in matches:
if len(match) > 0:
urla = re.search(pattern2, match)
if urla.group(1) is not None:
print filename, urla.group(1)
urlb = re.search(pattern3, match)
if urlb.group(1) is not None:
print filename, urlb.group(1)
thank you.
Upvotes: 8
Views: 28121
Reputation: 319821
The reason for TypeError
is that search
or match
usually return either a MatchObject
or a None
. Only one of these has a group
method. And it's not a None
. So you need to do:
url = re.search(pattern2, match)
if url is not None:
print(filename, url.group(0))
P.S. PEP-8 suggests using 4 spaces for indentation. It's not just an opinion, it's a good practice. Your code is fairly hard to read.
Upvotes: 3
Reputation: 14318
I got the same problem.
Using python2.6, you can solve it in this way:
for match in matches: if len(match) > 0: urla = re.search(pattern2, match) try: urla.group(1): print filename, urla.group(1) excpet: print "Problem with",pattern2 urlb = re.search(pattern3, match) try: urlb.group(1) print filename, urlb.group(1) except: print "Problem with",pattern3
Upvotes: 2
Reputation: 37103
Please also note your mistaken assumption that the error was in the third match, when it was in fact in the second. This seems to have led to the mistaken assumption that the second match was doing something to invalidate the third, sending you way off track.
Upvotes: 0
Reputation: 3571
Your exception means that urla has a value of None. Since urla's value is determined by the re.search call, it follows that re.search returns None. And this happens when the string doesn't match the pattern.
So basically you should use:
urla = re.search(pattern2, match)
if urla is not None:
print filename, urla.group(1)
instead of what you have now.
Upvotes: 18