Reputation: 4689
I'm having trouble understanding why this class object doesn't take my init params. This is python 3.6.
IN ONE FILE I import a webcrawler and pass kwargs into it:
import metacrawler as mc
mc.crawlwrapper(url=archive_url, archive_org=True, index_pages=True, depth_limit=2, fileroot='content/')
DEBUG: yes, the True parameter is defined as True
at this point.
{'archive_org': True}
THAT GOES into an intermediate function that creates a class instance. Here's the in-between function that parses everything from the first function into the Crawler:
def crawlwrapper(**kw):
fileroot = kw.get('fileroot','')
url = kw['url']
print('DEBUG(pre):{0}'.format(kw))
depth_limit = kw.get('depth_limit',3)
confine_prefix= kw.get('confine_prefix') # use to make archive.org not follow external links
archive_org = kw.get('archive_org',False) # special urlparse rules apply
exclude=kw.get('exclude',[])
print_pov=kw.get('print_pov',False)
index_pages = kw.get('index_pages')
print('DEBUG(post): depth_limit, confine_prefix, index_pages, archive_org {0}'.format([depth_limit, confine_prefix, index_pages, archive_org]))
crawler = Crawler(url, depth_limit, confine_prefix, exclude, index_pages, print_pov, archive_org)
crawler.crawl()
Here is the Crawler
that receives kwargs from the crawlwrapper(**kw) function:
class Crawler(object):
##BUG: for some reason, at some point, the init defaults don't get overridden when instatiated.
def __init__(self, url, depth_limit, confine=None, exclude=[], locked=True, filter_seen=True, index_pages=True, print_pov=False, archive_org=None):
print('depth_limit {0}, confine {1}, index_pages {2}, archive_org {3}'.format(depth_limit, confine, index_pages, archive_org))
DEBUG: Here's what is received in the crawler.crawl() class method:
depth_limit 2, confine http://www.cfu.or.ug, index_pages True, archive_org None
notice that achive_org changed from True
to None
?
Why didn't Crawler receive my archive_org=True parameter?
Upvotes: 1
Views: 68
Reputation: 42649
It’s because when you write Class(a, b)
, it passes the values of a
and b
to whatever names were defined as the first two names on that class (or function).
But when you say Class(a=a, b=b)
, you’re saying “call Class.__init__
with a
set to a (from my scope)
and set b
to b (from my scope)
.
What you’ve written in your first call is equivalent to crawler = Crawler(root=url, depth_limit=depth_limit, confine=confine_prefix, exclude=exclude, locked=index_pages, filter_seen=print_pov, index_pages=archive_org, print_pov=False, archive_org=None)
.
In other words, there’s no implicit relationship between the namespace/scope of the caller and the signature of the function/method being called. The parameters are assigned positionally unless you use keyword arguments (a=a
).
Python 3 adds support for keyword-only arguments that might have helped prevent this situation. Had you defined Crawler.__init__
thus:
def __init__(self, root, depth_limit, *, confine=None, exclude=[], locked=True, filter_seen=True, index_pages=True, print_pov=False, archive_org=None)
That *
would signal that the remaining parameters cannot be specified positionally and must be indicated by name.
Honestly, I was stumped on the problem too until I got to the hint.
Upvotes: 2