Ethansong
Ethansong

Reputation: 53

Python: how to add an empty list in derived class

Yes I'm a new bee to python. this is my first program after studying it.

this is a HTML crawler, to fetch all the mp3 files on a certain site.

I have done the job, but confused with 2 problems, here is what I have been through:

I tried to add a class derive from HTMLParser.

class MyHTMLParser(HTMLParser.HTMLParser):
    def handle_starttag(self, tag, attrs):
        pass

it worked. so I tried to add a list member to remember every URL it meets.

class MyHTMLParser(HTMLParser.HTMLParser):
    urlList = []
    def handle_starttag(self, tag, attrs):
        pass

but soon I discovered that it's like a "static" class member in C++.

in python we do not need to declare a member before using.

so the code goes to:

class MyHTMLParser(HTMLParser.HTMLParser):
    def handle_starttag(self, tag, attrs):
        if something:
            self.urlList.append(target)

but the python pop some error when runing it, says MyHTMLParser has no attributes of "urlList".

I'm confused, why python would not add it automatically?

so I added the "initialization" (and "declaration" in my mind) like this:

class MyHTMLParser(HTMLParser.HTMLParser):
    def __init__(self):
        self.urllist = []
    def handle_starttag(self, tag, attrs):
        if something:
            self.urlList.append(target)

but in this form, the python told me some error about the HTMLParser, I found it's because that I didn't call the parent init().

so the code goes to:

class MyHTMLParser(HTMLParser.HTMLParser):
    def __init__(self):
        #DO PARENT CLASS INIT
        self.urllist = []
    def handle_starttag(self, tag, attrs):
        if something:
            self.urlList.append(target)

the #DO PARENT CLASS INIT, I found two ways listed below.

one is stupid, I went to the HTMLParser lib and found what it did in the init() and copied:

self.reset() #it's what HTMLParser did in __init__()

or

HTMLParser.HTMLParser.__init__(self) 

I know it's workable but ugly, so my question is:

1, the elegant way to call the parent init() method when overriding it.

2, how to add a list member without "declaration" like I listed in the code.

Upvotes: 2

Views: 346

Answers (3)

J0HN
J0HN

Reputation: 26921

To invoke parent class method you just use super(<currentClass>, self).method(<parent method arguments>). In your case it would be something like

class MyHTMLParser(HTMLParser.HTMLParser):
    def __init__(self):
        #python 3
        super(MyHTMLParser, self).__init()
        # in python 2 HTMLParser is an old-style class, so the above won't work
        HTMLParser.HTMLParser.__init__(self)

Refer to super manual

As of "declaration" - looks like you misundestood what "declaration" means in python. Python does not automatically create stuff when you try to read something. So, by doing

self.my_urllist = []

you just tell it to create an empty list and store it at my_urllist class member. It's not "declaration", but just an assingment. However,

self.my_urllist.append(target)

Reads as "please read self.my_urllist than try to invoke append method on whatever you have read". Two things can go wrong here: (1) self.my_urllist does not exist; (2) self.my_urllist does not have append method.

So, in order to make it work, you must ensure that before you do self.my_urllist.append you actually have self.my_urllist member and it is a list. Pythonic way to do it is to create my_urllist in __init__ and assign it some sensible value (empty list in your case).

If you absolutely don't want to override __init__ you can go with technique known as *lazy initialization", like so:

class MyHTMLParser(HTMLParser.HTMLParser):
    @property
    def my_urllist(self):
        if not hasattr(self, '_my_urllist'):
             self._my_urllist = []
        return self._my_urllist

property is a decorator that makes method look like attribute. But anyway, some object attribute (_my_urllist) is created and initialized, it's just delayed until you really need it.

Upvotes: 1

Serbitar
Serbitar

Reputation: 2224

Your second way is the right way to do it. Its not ugly.

As an alternative you can do:

super(MyHTMLParser, self).__init__()

You are doing everything the right way.

Upvotes: 0

user2357112
user2357112

Reputation: 280465

Never copy the parent method's code. The two standard options to call the parent's __init__ are

HTMLParser.HTMLParser.__init__(self)

and

super(MyHTMLParser, self).__init__()

If you're using Python 3, the second option has been simplified to

super().__init__()

There's no way to avoid the self.my_urllist = [] step. After all, Python needs some way to tell that this attribute is supposed to be a list.

Upvotes: 0

Related Questions