shridhar singh
shridhar singh

Reputation: 45

Using class methods to return instances and simply defining __init__(self, **kwargs):

One pattern I recently started using is writing class methods to return instances. In particular, I've been using it for dataclasses (with or without the @dataclass decorator). But it has also led me to defining vague __init__ methods as follows:

def __init__(self, **kwargs):
    for k,v in kwargs:
        setattr(self, k, v)

As a more fleshed out example, let's say I'm writing a metadata class that holds the details of a standardized test question. I expect all instances of the class to have the same attributes, so I use __slots__, and I have functions defined in another module to read various parts of the question from an html file.

class Metadata:
    __slots__ = question_id, testid, itemnum, subject, system, topic, images, tables, links

    @classmethod
    def from_html(cls, html: BeautifulSoup):
        # next two lines will create the dict metadata with keys for 
        # everything in __slots__
        metadata = MyModule.parse_details(html)
        metadata['images'] = MyModule.process_images(html)
        metadata['tables'] = MyModule.read_tables(html)
        metadata['links'] = MyModule.pull_links(html)
        return cls(**metadata)
        
    @classmethod
    def from_file(filepath: str):
        with open(filepath, 'r') as f:
            metadata = json.load(f)
        return cls(**metadata)

    def __init__(self, **kwargs):
        for k,v in kwargs:
            setattr(self, k, v)

So to me this seems like the best way to accomplish the task, which is create a dataclass to hold metadata which can be initialized from multiple different sources (files, dicts, other dataclasses I've defined, etc). The downside is that __init__ is very opaque. Also it feels weird to use **kwargs when the __init__ has to take the same keyword arguments every time for the class to work as I intend (that's partly why I used __slots__ too: to make the definition of the dataclass more clear).

Also the documentation of the attrs package for Python says this:

For similar reasons, we strongly discourage from patterns like:

pt = Point(**row.attributes)

which couples your classes to the database data model. Try to design your classes in a way that is clean and convenient to use – not based on your database format. The database format can change anytime and you’re stuck with a bad class design that is hard to change. Embrace functions and classmethods as a filter between reality and what’s best for you to work with.

That's near the top of the page of the link I included, and I really don't understand what it's trying to say, hence my question.

So would you implement my code any differently, and what is the attrs documentation trying to say?

Upvotes: 0

Views: 163

Answers (1)

enzo
enzo

Reputation: 11496

Suppose you have the following JSON:

{
  "userId": 1,
  "id": 1,
  "title": "delectus aut autem",
  "completed": false
}

and you initialize your class doing

class Post:
    def __init__(self, **kwargs):
        for k, v in kwargs.items():
            setattr(self, k, v)

import json

with open(filepath, encoding='utf-8') as f:
    data = json.load(f)
    post = Post(**data)

The following code

if not post.completed:
   # do something and exit
else:
   print(post.userId)

will work as expected. However, suppose you need to rename e the userId column to user_id (the "The database format can change anytime" part of the documentation). Now you need to rename ALL the occurences of post.userId to post.user_id in all of your code. It's fine if your codebase consists of only one Python file, but what if it contains a lot of files and dependencies?

Now suppose you initialize your class doing

class Post:
    def __init__(self, postId, id, title, completed):
        self.postId = postId
        self.id = id
        self.title = title
        self.completed = completed

import json

with open(filepath, encoding='utf-8') as f:
    data = json.load(f)
    post = Post(
        postId=data['postId'],
        id=data['id'],
        title=data['title'],
        completed=data['completed'],
    )

Now if postId is renamed to post_id, you only need to change ONE place in your whole codebase: when you read from the JSON file.

Other situations include

  • your database designer adding a column named aVeryLongFieldNameThatYouDoesNotWantYoInsertIntoYourPythonCode
  • some linter complaining the Python attributes should be snake_case instead of camelCase
  • you wanting to typecheck your code using mypy, which does not work very well with setattr

Upvotes: 0

Related Questions