Reputation: 53
Yes I'm a new bee to python. this is my first program after studying it.
this is a HTML crawler, to fetch all the mp3 files on a certain site.
I have done the job, but confused with 2 problems, here is what I have been through:
I tried to add a class derive from HTMLParser.
class MyHTMLParser(HTMLParser.HTMLParser):
def handle_starttag(self, tag, attrs):
pass
it worked. so I tried to add a list member to remember every URL it meets.
class MyHTMLParser(HTMLParser.HTMLParser):
urlList = []
def handle_starttag(self, tag, attrs):
pass
but soon I discovered that it's like a "static" class member in C++.
in python we do not need to declare a member before using.
so the code goes to:
class MyHTMLParser(HTMLParser.HTMLParser):
def handle_starttag(self, tag, attrs):
if something:
self.urlList.append(target)
but the python pop some error when runing it, says MyHTMLParser has no attributes of "urlList".
I'm confused, why python would not add it automatically?
so I added the "initialization" (and "declaration" in my mind) like this:
class MyHTMLParser(HTMLParser.HTMLParser):
def __init__(self):
self.urllist = []
def handle_starttag(self, tag, attrs):
if something:
self.urlList.append(target)
but in this form, the python told me some error about the HTMLParser, I found it's because that I didn't call the parent init().
so the code goes to:
class MyHTMLParser(HTMLParser.HTMLParser):
def __init__(self):
#DO PARENT CLASS INIT
self.urllist = []
def handle_starttag(self, tag, attrs):
if something:
self.urlList.append(target)
the #DO PARENT CLASS INIT, I found two ways listed below.
one is stupid, I went to the HTMLParser lib and found what it did in the init() and copied:
self.reset() #it's what HTMLParser did in __init__()
or
HTMLParser.HTMLParser.__init__(self)
I know it's workable but ugly, so my question is:
1, the elegant way to call the parent init() method when overriding it.
2, how to add a list member without "declaration" like I listed in the code.
Upvotes: 2
Views: 346
Reputation: 26921
To invoke parent class method you just use super(<currentClass>, self).method(<parent method arguments>)
. In your case it would be something like
class MyHTMLParser(HTMLParser.HTMLParser):
def __init__(self):
#python 3
super(MyHTMLParser, self).__init()
# in python 2 HTMLParser is an old-style class, so the above won't work
HTMLParser.HTMLParser.__init__(self)
Refer to super
manual
As of "declaration" - looks like you misundestood what "declaration" means in python. Python does not automatically create stuff when you try to read something. So, by doing
self.my_urllist = []
you just tell it to create an empty list and store it at my_urllist
class member. It's not "declaration", but just an assingment. However,
self.my_urllist.append(target)
Reads as "please read self.my_urllist
than try to invoke append
method on whatever you have read". Two things can go wrong here: (1) self.my_urllist
does not exist; (2) self.my_urllist
does not have append
method.
So, in order to make it work, you must ensure that before you do self.my_urllist.append
you actually have self.my_urllist
member and it is a list. Pythonic way to do it is to create my_urllist
in __init__
and assign it some sensible value (empty list in your case).
If you absolutely don't want to override __init__
you can go with technique known as *lazy initialization", like so:
class MyHTMLParser(HTMLParser.HTMLParser):
@property
def my_urllist(self):
if not hasattr(self, '_my_urllist'):
self._my_urllist = []
return self._my_urllist
property
is a decorator that makes method look like attribute. But anyway, some object attribute (_my_urllist
) is created and initialized, it's just delayed until you really need it.
Upvotes: 1
Reputation: 2224
Your second way is the right way to do it. Its not ugly.
As an alternative you can do:
super(MyHTMLParser, self).__init__()
You are doing everything the right way.
Upvotes: 0
Reputation: 280465
Never copy the parent method's code. The two standard options to call the parent's __init__
are
HTMLParser.HTMLParser.__init__(self)
and
super(MyHTMLParser, self).__init__()
If you're using Python 3, the second option has been simplified to
super().__init__()
There's no way to avoid the self.my_urllist = []
step. After all, Python needs some way to tell that this attribute is supposed to be a list.
Upvotes: 0