Reputation: 571
I'm trying to make a small program that gets a random website and counts the elements.
Here is my error:
Traceback (most recent call last):
File "elements counter.py", line 23, in <module>
if elem[1] == string:
TypeError: 'int' object is unsubscriptable
Here is my code:
from urllib2 import Request, urlopen, URLError
print 'Fetching URL..'
try:
html = urlopen(Request("http://www.randomwebsite.com/cgi-bin/random.pl"))
except URLError:
html = urlopen(Request("http://www.randomwebsitemachine.com/random_website/"))
print 'Loading HTML..'
ellist = [(None,None),]
isel = False
string = ''
for char in html.read():
if char == '<':
isel=True
elif isel:
if char == ' ' or char == '>':
if string in ellist:
for elem in ellist:
if elem[1] == string:
elem[0] += 1
else:
ellist += (1,string)
isel = False
string = ''
else:
string += char
print sorted(ellist, key = lambda tempvar: tempvar[0])
html.close()
raw_input()
Please point out if you find anything more wrong in the code.
Upvotes: 0
Views: 2612
Reputation: 176910
When you do
ellist += (1,string)
it's the same as
ellist.extend((1,string))
so ellist
looks something like
[(None, None), 1, string]
so when you get to the second element in the for
loop, it's an int
not a tuple
.
Instead, do
ellist.append((1,string))
or, if you really want to use +=
,
ellist += [(1,string)]
The rest of your code looks basically right, but note you won't properly handle angle brackets in quotes or in HTML comments. If you want to parse HTML, use one of the many HTML parsers out there, like Python's HTMLParser module, lxml, or BeautifulSoup.
Upvotes: 2