Marc Maxmeister
Marc Maxmeister

Reputation: 4689

Parameters/kwargs I pass into a class during instantiation __init__ get ignored

I'm having trouble understanding why this class object doesn't take my init params. This is python 3.6.

IN ONE FILE I import a webcrawler and pass kwargs into it:

import metacrawler as mc
mc.crawlwrapper(url=archive_url, archive_org=True, index_pages=True, depth_limit=2, fileroot='content/')

DEBUG: yes, the True parameter is defined as True at this point.

{'archive_org': True}

THAT GOES into an intermediate function that creates a class instance. Here's the in-between function that parses everything from the first function into the Crawler:

def crawlwrapper(**kw):
    fileroot = kw.get('fileroot','')
    url = kw['url']
    print('DEBUG(pre):{0}'.format(kw))
    depth_limit = kw.get('depth_limit',3)
    confine_prefix= kw.get('confine_prefix') # use to make archive.org not follow external links
    archive_org = kw.get('archive_org',False) # special urlparse rules apply
    exclude=kw.get('exclude',[])
    print_pov=kw.get('print_pov',False)    
    index_pages = kw.get('index_pages')    
    print('DEBUG(post): depth_limit, confine_prefix, index_pages, archive_org {0}'.format([depth_limit, confine_prefix, index_pages, archive_org]))

    crawler = Crawler(url, depth_limit, confine_prefix, exclude, index_pages, print_pov, archive_org)
    crawler.crawl()

Here is the Crawler that receives kwargs from the crawlwrapper(**kw) function:

class Crawler(object):
    ##BUG: for some reason, at some point, the init defaults don't get overridden when instatiated.

    def __init__(self, url, depth_limit, confine=None, exclude=[], locked=True, filter_seen=True, index_pages=True, print_pov=False, archive_org=None):
        print('depth_limit {0}, confine {1}, index_pages {2}, archive_org {3}'.format(depth_limit, confine, index_pages, archive_org))

DEBUG: Here's what is received in the crawler.crawl() class method:

depth_limit 2, confine http://www.cfu.or.ug, index_pages True, archive_org None

notice that achive_org changed from True to None?

Why didn't Crawler receive my archive_org=True parameter?

Upvotes: 1

Views: 68

Answers (1)

Jason R. Coombs
Jason R. Coombs

Reputation: 42649

It’s because when you write Class(a, b), it passes the values of a and b to whatever names were defined as the first two names on that class (or function).

But when you say Class(a=a, b=b), you’re saying “call Class.__init__ with a set to a (from my scope) and set b to b (from my scope).

What you’ve written in your first call is equivalent to crawler = Crawler(root=url, depth_limit=depth_limit, confine=confine_prefix, exclude=exclude, locked=index_pages, filter_seen=print_pov, index_pages=archive_org, print_pov=False, archive_org=None).

In other words, there’s no implicit relationship between the namespace/scope of the caller and the signature of the function/method being called. The parameters are assigned positionally unless you use keyword arguments (a=a).

Python 3 adds support for keyword-only arguments that might have helped prevent this situation. Had you defined Crawler.__init__ thus:

def __init__(self, root, depth_limit, *, confine=None, exclude=[], locked=True, filter_seen=True, index_pages=True, print_pov=False, archive_org=None)

That * would signal that the remaining parameters cannot be specified positionally and must be indicated by name.

Honestly, I was stumped on the problem too until I got to the hint.

Upvotes: 2

Related Questions