Reputation: 2845
I'm writing a library to access REST API. It returns json with user object. I convert it to dict, and then convert it to dataclass object. The problem is that not all fields are fixed. I want to add additional fields (which are not specified in my dataclass) dynamically. I can simply assign values to my object, but they don't appear in the object representation and dataclasses.asdict
function doesn't add them into resulting dict:
from dataclasses import asdict, dataclass
@dataclass
class X:
i: int
x = X(i=42)
x.s = 'text'
x
# X(i=42)
x.s
# 'text'
asdict(x)
# {'i': 42}
Upvotes: 33
Views: 42684
Reputation: 11232
You could use make_dataclass
to create X
on the fly:
from dataclasses import asdict, dataclass, make_dataclass
X = make_dataclass('X', [('i', int), ('s', str)])
x = X(i=42, s='text')
asdict(x)
# {'i': 42, 's': 'text'}
Or as a derived class:
from dataclasses import asdict, dataclass, make_dataclass
@dataclass
class X:
i: int
x = X(i=42)
x.__class__ = make_dataclass('Y', fields=[('s', str)], bases=(X,))
x.s = 'text'
asdict(x)
# {'i': 42, 's': 'text'}
Upvotes: 49
Reputation: 11612
Update (6/22): As it's now mid-2022, I thought I'd refresh my answer with a brand new approach I've been toying around with. I am pleased to announce a fast, modern library I have quite recently released, called dotwiz
.
The dotwiz
library can be installed with pip:
pip install dotwiz
This is a tiny helper library that I've created, which makes dict
objects safe to access by dot notation - such as a.b.c
instead of a['b']['c']
. From personal tests and benchmarks, it's actually a lot faster than something like make_dataclass
- more info on this below.
Additionally, one can also subclass from DotWiz
or DotWizPlus
, and this enables type hinting and auto-completion hints from an IDE such as PyCharm. Here is a simple example of that below:
from dataclasses import asdict, make_dataclass
from dotwiz import DotWiz
class MyTypedWiz(DotWiz):
# add attribute names and annotations for better type hinting!
i: int
s: str
dw = MyTypedWiz(i=42, s='text')
print(dw)
# ✫(i=42, s='text')
print(dw.to_dict())
# {'i': 42, 's': 'text'}
If you still prefer to use dataclasses to model your data, I've included my original answer below that is mostly unchanged from years past.
The follow results were timed on a Mac Pro with the M1 chip, Python 3.10.4, and with n=5000
iterations.
Creating or instantiating the object:
$ python -m timeit -n 5000 -s "from dotwiz import DotWiz" -c "DotWiz(i=42, s='text')"
5000 loops, best of 5: 425 nsec per loop
$ python -m timeit -n 5000 -s "from dataclasses import make_dataclass" -c "X = make_dataclass('X', [('i', int), ('s', str)]); X(i=42, s='text')"
5000 loops, best of 5: 97.8 usec per loop
These times are probably inflated, but in this particular case it looks like DotWiz
is about 250x faster than make_dataclass
. In practice, I would say it's about 100 times faster on average.
Accessing a key by dot notation:
$ python -m timeit -n 5000 -s "from dotwiz import DotWiz" -s "dw = DotWiz(i=42, s='text')" -c "dw.s.lower()"
5000 loops, best of 5: 39.7 nsec per loop
$ python -m timeit -n 5000 -s "from dataclasses import make_dataclass" -s "X = make_dataclass('X', [('i', int), ('s', str)])" -s "x = X(i=42, s='text')" -c "x.s.lower()"
5000 loops, best of 5: 39.9 nsec per loop
The times to access an attribute or a key look to be mostly the same.
Serializing the object to JSON:
$ python -m timeit -n 5000 -s "import json" -s "from dotwiz import DotWiz" -s "dw = DotWiz(i=42, s='text')" -c "json.dumps(dw)"
5000 loops, best of 5: 1.1 usec per loop
$ python -m timeit -n 5000 -s "import json" -s "from dotwiz import DotWiz" -s "dw = DotWiz(i=42, s='text')" -c "json.dumps(dw.to_dict())"
5000 loops, best of 5: 1.46 usec per loop
$ python -m timeit -n 5000 -s "import json" -s "from dataclasses import asdict, make_dataclass" -s "X = make_dataclass('X', [('i', int), ('s', str)])" -s "x = X(i=42, s='text')" -c "json.dumps(asdict(x))"
5000 loops, best of 5: 2.87 usec per loop
So, it actually looks like it's about 2.5x faster to serialize a DotWiz
object, as compared to a dataclass
instance.
As mentioned, fields marked as optional should resolve the issue. If not, consider using properties in dataclasses
. Yep, regular properties should work well enough - though you'll have to declare field in __post_init__
, and that's slightly inconvenient.
If you want to set a default value for the property so accessing getter immediately after creating the object works fine, and if you also want to be able to set a default value via constructor, you can make use of a concept called field properties; a couple libraries like dataclass-wizard provide full support for that.
example usage:
from dataclasses import asdict, dataclass
from typing import Optional
from dataclass_wizard import property_wizard
@dataclass
class X(metaclass=property_wizard):
i: int
s: Optional[str] = None
@property
def _s(self):
"""Returns a title-cased value, i.e. `stRiNg` -> `String`"""
return self._s.title() if self._s else None
@_s.setter
def _s(self, s: str):
"""Reverses a string, i.e. `olleH` -> `Hello` """
self._s = s[::-1] if s else None
x = X(i=42)
x
# X(i=42, s=None)
assert x.s is None # True
x.s = '!emordnilap'
x
# X(i=42, s='Palindrome!')
x.s
# 'Palindrome!'
asdict(x)
# {'i': 42, 's': 'Palindrome!'}
Disclaimer: I am the creator (and maintener) of this library.
Upvotes: 10