Reputation: 45
One pattern I recently started using is writing class methods to return instances. In particular, I've been using it for dataclasses (with or without the @dataclass decorator). But it has also led me to defining vague __init__ methods as follows:
def __init__(self, **kwargs):
for k,v in kwargs:
setattr(self, k, v)
As a more fleshed out example, let's say I'm writing a metadata class that holds the details of a standardized test question. I expect all instances of the class to have the same attributes, so I use __slots__, and I have functions defined in another module to read various parts of the question from an html file.
class Metadata:
__slots__ = question_id, testid, itemnum, subject, system, topic, images, tables, links
@classmethod
def from_html(cls, html: BeautifulSoup):
# next two lines will create the dict metadata with keys for
# everything in __slots__
metadata = MyModule.parse_details(html)
metadata['images'] = MyModule.process_images(html)
metadata['tables'] = MyModule.read_tables(html)
metadata['links'] = MyModule.pull_links(html)
return cls(**metadata)
@classmethod
def from_file(filepath: str):
with open(filepath, 'r') as f:
metadata = json.load(f)
return cls(**metadata)
def __init__(self, **kwargs):
for k,v in kwargs:
setattr(self, k, v)
So to me this seems like the best way to accomplish the task, which is create a dataclass to hold metadata which can be initialized from multiple different sources (files, dicts, other dataclasses I've defined, etc). The downside is that __init__ is very opaque. Also it feels weird to use **kwargs when the __init__ has to take the same keyword arguments every time for the class to work as I intend (that's partly why I used __slots__ too: to make the definition of the dataclass more clear).
Also the documentation of the attrs
package for Python says this:
For similar reasons, we strongly discourage from patterns like:
pt = Point(**row.attributes)
which couples your classes to the database data model. Try to design your classes in a way that is clean and convenient to use – not based on your database format. The database format can change anytime and you’re stuck with a bad class design that is hard to change. Embrace functions and classmethods as a filter between reality and what’s best for you to work with.
That's near the top of the page of the link I included, and I really don't understand what it's trying to say, hence my question.
So would you implement my code any differently, and what is the attrs documentation trying to say?
Upvotes: 0
Views: 163
Reputation: 11496
Suppose you have the following JSON:
{
"userId": 1,
"id": 1,
"title": "delectus aut autem",
"completed": false
}
and you initialize your class doing
class Post:
def __init__(self, **kwargs):
for k, v in kwargs.items():
setattr(self, k, v)
import json
with open(filepath, encoding='utf-8') as f:
data = json.load(f)
post = Post(**data)
The following code
if not post.completed:
# do something and exit
else:
print(post.userId)
will work as expected. However, suppose you need to rename e the userId
column to user_id
(the "The database format can change anytime" part of the documentation). Now you need to rename ALL the occurences of post.userId
to post.user_id
in all of your code. It's fine if your codebase consists of only one Python file, but what if it contains a lot of files and dependencies?
Now suppose you initialize your class doing
class Post:
def __init__(self, postId, id, title, completed):
self.postId = postId
self.id = id
self.title = title
self.completed = completed
import json
with open(filepath, encoding='utf-8') as f:
data = json.load(f)
post = Post(
postId=data['postId'],
id=data['id'],
title=data['title'],
completed=data['completed'],
)
Now if postId
is renamed to post_id
, you only need to change ONE place in your whole codebase: when you read from the JSON file.
Other situations include
aVeryLongFieldNameThatYouDoesNotWantYoInsertIntoYourPythonCode
snake_case
instead of camelCase
mypy
, which does not work very well with setattr
Upvotes: 0