sgp
sgp

Reputation: 1778

Python 3 Special characters escaping

import urllib
from urllib.request import urlopen


address='http://www.iitb.ac.in/acadpublic/RunningCourses.jsp?deptcd=EE&year=2012&semester=1'
source= urlopen(address).read()
source=str(source)


from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
        def handle_data(self, data):
            x=str(data)
            if x != ('\r\n\t\t\t\t') or ('\r\n\t\t\t\t\t') or ('\r\n\r\n\t\t\t'):
                print("Encountered some data:",x)

parser = MyHTMLParser(strict=False)
parser.feed(source)

The above code isn't working. It is still printing '\r\n\t\t\t\t' stuff. Any suggestions?

Upvotes: 1

Views: 1651

Answers (2)

jamylak
jamylak

Reputation: 133764

if x != ('\r\n\t\t\t\t') or ('\r\n\t\t\t\t\t') or ('\r\n\r\n\t\t\t')

should be

if x not in ('\r\n\t\t\t\t', '\r\n\t\t\t\t\t', '\r\n\r\n\t\t\t')

or better:

if not x.isspace()

Your first code is evaluated as:

if (x != ('\r\n\t\t\t\t')) or '\r\n\t\t\t\t\t' or '\r\n\r\n\t\t\t'

Notice the last values are evaluated as themselves! Only an empty string will evaluate False thus this condition will always pass

Upvotes: 1

Omair Shamshir
Omair Shamshir

Reputation: 2136

may be the number of \t and \r etc are varying try this :

if x.replace('\r','').replace('\n','').replace('\t','').strip():
    print("Encountered some data:",x)

Upvotes: 0

Related Questions