Python 3 - HTML Parser - Empty Attributes

Question

def handle_starttag(self, tag, attrs):
    print(attrs)

[]

How come my attrs is an empty list? Where is the data inside of the tags? I don't know why my attrs is empty, and I need the data from it, either from handle_data or from the attrs

import urllib.request
from html.parser import HTMLParser
import sys

class myHTMLParser(HTMLParser):
    
    def __init__(self):
        HTMLParser.__init__(self)
        self.country = {}
        
    def handle_starttag(self, tag, attrs):
        if tag == 'currency_name':
            self.country[self.handle_data] = tag
        print(self.country)
        
    def handle_endtag(self, tag):
        pass
    
    def handle_data(self, data):
        return(data.strip())
    
def main():
    if len(sys.argv) > 1:
        link = sys.argv[1]
    else:   
        link = 'http://www.bankofcanada.ca/stats/assets/xml/noon-five-day.xml' 
        
        
    myparser = myHTMLParser()    
    file = open(link, 'r')
    html = file.read()
    myparser.feed(html)
    file.close()
main()

jcoppens · Accepted Answer

I think you are confused. At least the URL in your program does not have attributes, but it does have data. Attributes are the information which is inside the tags themselves. This is one way to transfer information.

In the case of your page, the information is between the start tag and the end tag.

Like is one way of transferring the info.

 this is text

is another.

As there are no attributes, that list is empty. The data is in the results returned by handle_data.

Python 3 - HTML Parser - Empty Attributes

Answers (1)

Related Questions