Reputation: 4740
I have this little code and it's giving me AttributeError: 'NoneType' object has no attribute 'group'.
import sys
import re
#def extract_names(filename):
f = open('name.html', 'r')
text = f.read()
match = re.search (r'<hgroup><h1>(\w+)</h1>', text)
second = re.search (r'<li class="hover">Employees: <b>(\d+,\d+)</b></li>', text)
outf = open('details.txt', 'a')
outf.write(match)
outf.close()
My intention is to read a .HTML file looking for the <h1>
tag value and the number of employees and append them to a file. But for some reason I can't seem to get it right.
Your help is greatly appreciated.
Upvotes: 0
Views: 3091
Reputation: 20329
Just for the sake of completion: your error message just indicate that your regular expression failed and did not return anything...
Upvotes: 1
Reputation: 1121186
You are using a regular expression, but matching XML with such expressions gets too complicated, too fast. Don't do that.
Use a HTML parser instead, Python has several to choose from:
The latter two handle malformed HTML quite gracefully as well, making decent sense of many a botched website.
ElementTree example:
from xml.etree import ElementTree
tree = ElementTree.parse('filename.html')
for elem in tree.findall('h1'):
print ElementTree.tostring(elem)
Upvotes: 6