Reputation: 1778
import urllib
from urllib.request import urlopen
address='http://www.iitb.ac.in/acadpublic/RunningCourses.jsp?deptcd=EE&year=2012&semester=1'
source= urlopen(address).read()
source=str(source)
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_data(self, data):
x=str(data)
if x != ('\r\n\t\t\t\t') or ('\r\n\t\t\t\t\t') or ('\r\n\r\n\t\t\t'):
print("Encountered some data:",x)
parser = MyHTMLParser(strict=False)
parser.feed(source)
The above code isn't working. It is still printing '\r\n\t\t\t\t' stuff. Any suggestions?
Upvotes: 1
Views: 1651
Reputation: 133764
if x != ('\r\n\t\t\t\t') or ('\r\n\t\t\t\t\t') or ('\r\n\r\n\t\t\t')
should be
if x not in ('\r\n\t\t\t\t', '\r\n\t\t\t\t\t', '\r\n\r\n\t\t\t')
or better:
if not x.isspace()
Your first code is evaluated as:
if (x != ('\r\n\t\t\t\t')) or '\r\n\t\t\t\t\t' or '\r\n\r\n\t\t\t'
Notice the last values are evaluated as themselves! Only an empty string will evaluate False
thus this condition will always pass
Upvotes: 1
Reputation: 2136
may be the number of \t and \r etc are varying try this :
if x.replace('\r','').replace('\n','').replace('\t','').strip():
print("Encountered some data:",x)
Upvotes: 0