Reputation: 4424
I have the following very simple dataclass:
import dataclasses
@dataclasses.dataclass
class Test:
value: int
I create an instance of the class but instead of an integer I use a string:
>>> test = Test('1')
>>> type(test.value)
<class 'str'>
What I actually want is a forced conversion to the datatype i defined in the class defintion:
>>> test = Test('1')
>>> type(test.value)
<class 'int'>
Do I have to write the __init__
method manually or is there a simple way to achieve this?
Upvotes: 72
Views: 41831
Reputation: 398
Here is a version of @decese 's answer which works whether you're using "from __future__ import annotations
"-style imports or not, and also can work across as many dataclasses as you'd like:
from __future__ import annotations
from abc import ABC, abstractmethod
from dataclasses import dataclass, fields, is_dataclass
class TypedDataclass(ABC):
"""For a dataclass instance, use its type hints to enforce types.
This solution is adapted to work with 'from __future__ import annotations'-style type hints.
"""
@abstractmethod
def __init__(self):
"""Use this class as a base class for inheritance with dataclasses only."""
def __post_init__(self) -> None:
"""Provide __post_init__ to a dataclass just by using inheritance.
Make sure to use super() when providing your own post-initialization logic.
"""
if not is_dataclass(self):
raise TypeError(f"'{self.__class__.__name__}' is not a dataclass.")
for field in fields(self):
value = getattr(self, field.name)
# If module includes from __future__ import annotations, an annotation is just a string.
# This code is safe, unless you're generating your type hints through external input
annotation = eval(field.type) if isinstance(field.type, str) else field.type # noqa: S307
if isinstance(value, annotation):
continue
if not isinstance(value, type):
raise NotImplementedError(
"To coerce union types, please use a third-party library like attrs or Pydantic."
)
try:
setattr(self, field.name, annotation(value))
except ValueError as exc:
raise TypeError(
f"Unable to coerce '{field.name}' from {type(value)} to {field.type}, "
f"value - {value!r}"
) from exc
@dataclass(slots=True)
class Test(TypedDataclass):
test_field: int
print(Test(test_field="123"))
Upvotes: 0
Reputation: 1773
Here's how I would keep the dataclass following the least surprise principle i.e The type declaration in a dataclass should be what is expected and returned.
import dataclasses
@dataclasses.dataclass
class Test:
value: int
@classmethod
from_text(cls, text: str):
return cls(int(text))
Test(1) == Test.from_text('1')
There's a few benefits to this example:
from_text
could have more advanced parsing when the supplied string should produce multiple arguments. Further more, you could add additional class methods if there was another source of instantiating input say from a different text source or another type of API all together. You'd want to ensure they should all produce comparable instances though.classmethod
is used more generally so more likely to be understood then the dataclass
specific __post_init__
and field
hooks.The overall result is an effective block of code with clear, easily understood intent.
As for some of the other examples put forward that are a version of the official python docs, dataclasses do not support declaring separate types for incoming arguments and returned stored values (which is fair enough their suppose to be simplifying). Working around a work around is getting as convoluted as it sounds.
Upvotes: 0
Reputation: 31
I had the problem of converting numpy arrays to lists and this did the job:
def fix_field_types(self):
for key, value in self.asdict().items():
field = self.__dataclass_fields__[key]
if not field.type == type(value):
new_value = field.type.__call__(value)
self.__setattr__(field.name, new_value)
Upvotes: 0
Reputation: 33
Why not use setattr
?
from dataclasses import dataclass, fields
@dataclass()
class Test:
value: int
def __post_init__(self):
for field in fields(self):
setattr(self, field.name, field.type(getattr(self, field.name)))
Which yields the required result:
>>> test = Test('1')
>>> type(test.value)
<class 'int'>
Upvotes: 1
Reputation: 11670
You could use a generic type-conversion descriptor, declared in descriptors.py
:
import sys
class TypeConv:
__slots__ = (
'_name',
'_default_factory',
)
def __init__(self, default_factory=None):
self._default_factory = default_factory
def __set_name__(self, owner, name):
self._name = "_" + name
if self._default_factory is None:
# determine default factory from the type annotation
tp = owner.__annotations__[name]
if isinstance(tp, str):
# evaluate the forward reference
base_globals = getattr(sys.modules.get(owner.__module__, None), '__dict__', {})
idx_pipe = tp.find('|')
if idx_pipe != -1:
tp = tp[:idx_pipe].rstrip()
tp = eval(tp, base_globals)
# use `__args__` to handle `Union` types
self._default_factory = getattr(tp, '__args__', [tp])[0]
def __get__(self, instance, owner):
return getattr(instance, self._name)
def __set__(self, instance, value):
setattr(instance, self._name, self._default_factory(value))
Usage in main.py
would be like:
from __future__ import annotations
from dataclasses import dataclass
from descriptors import TypeConv
@dataclass
class Test:
value: int | str = TypeConv()
test = Test(value=1)
print(test)
test = Test(value='12')
print(test)
# watch out: the following assignment raises a `ValueError`
try:
test.value = '3.21'
except ValueError as e:
print(e)
Output:
Test(value=1)
Test(value=12)
invalid literal for int() with base 10: '3.21'
Note that while this does work for other simple types, it does not handle conversions for certain types - such as bool
or datetime
- as normally expected.
If you are OK with using third-party libraries for this, I have come up with a (de)serialization library called the dataclass-wizard that can perform type conversion as needed, but only when fromdict()
is called:
from __future__ import annotations
from dataclasses import dataclass
from dataclass_wizard import JSONWizard
@dataclass
class Test(JSONWizard):
value: int
is_active: bool
test = Test.from_dict({'value': '123', 'is_active': 'no'})
print(repr(test))
assert test.value == 123
assert not test.is_active
test = Test.from_dict({'is_active': 'tRuE', 'value': '3.21'})
print(repr(test))
assert test.value == 3
assert test.is_active
Upvotes: 0
Reputation: 21
You could use descriptor-typed field:
class IntConversionDescriptor:
def __set_name__(self, owner, name):
self._name = "_" + name
def __get__(self, instance, owner):
return getattr(instance, self._name)
def __set__(self, instance, value):
setattr(instance, self._name, int(value))
@dataclass
class Test:
value: IntConversionDescriptor = IntConversionDescriptor()
>>> test = Test(value=1)
>>> type(test.value)
<class 'int'>
>>> test = Test(value="12")
>>> type(test.value)
<class 'int'>
test.value = "145"
>>> type(test.value)
<class 'int'>
test.value = 45.12
>>> type(test.value)
<class 'int'>
Upvotes: 1
Reputation: 10998
With Python dataclasses
, the alternative is to use the __post_init__
method, as pointed out in other answers:
@dataclasses.dataclass
class Test:
value: int
def __post_init__(self):
self.value = int(self.value)
>>> test = Test("42")
>>> type(test.value)
<class 'int'>
Or you can use the attrs
package, which allows you to easily set converters:
@attr.define
class Test:
value: int = attr.field(converter=int)
>>> test = Test("42")
>>> type(test.value)
<class 'int'>
You can use the cattrs
package, that does conversion based on the type annotations in attr
classes and dataclasses, if your data comes from a mapping instead:
@dataclasses.dataclass
class Test:
value: int
>>> test = cattrs.structure({"value": "42"}, Test)
>>> type(test.value)
<class 'int'>
Pydantic will automatically do conversion based on the types of the fields in the model:
class Test(pydantic.BaseModel):
value: int
>>> test = Test(value="42")
>>> type(test.value)
<class 'int'>
Upvotes: 9
Reputation: 351
It's easy to achieve by using pydantic.validate_arguments
Just use the validate_arguments
decorator in your dataclass:
from dataclasses import dataclass
from pydantic import validate_arguments
@validate_arguments
@dataclass
class Test:
value: int
Then try your demo, the 'str type' 1 will convert from str
to int
>>> test = Test('1')
>>> type(test.value)
<class 'int'>
If you pass the truly wrong type, it will raise exception
>>> test = Test('apple')
Traceback (most recent call last):
...
pydantic.error_wrappers.ValidationError: 1 validation error for Test
value
value is not a valid integer (type=type_error.integer)
Upvotes: 25
Reputation: 932
Yeah, the easy answer is to just do the conversion yourself in your own __init__()
. I do this because I want my objects frozen=True
.
For the type validation, Pydandic claims to do it, but I haven't tried it yet: https://pydantic-docs.helpmanual.io/
Upvotes: 1
Reputation: 522523
The type hint of dataclass attributes is never obeyed in the sense that types are enforced or checked. Mostly static type checkers like mypy are expected to do this job, Python won't do it at runtime, as it never does.
If you want to add manual type checking code, do so in the __post_init__
method:
@dataclasses.dataclass
class Test:
value: int
def __post_init__(self):
if not isinstance(self.value, int):
raise ValueError('value not an int')
# or self.value = int(self.value)
You could use dataclasses.fields(self)
to get a tuple of Field
objects which specify the field and the type and loop over that to do this for each field automatically, without writing it for each one individually.
def __post_init__(self):
for field in dataclasses.fields(self):
value = getattr(self, field.name)
if not isinstance(value, field.type):
raise ValueError(f'Expected {field.name} to be {field.type}, '
f'got {repr(value)}')
# or setattr(self, field.name, field.type(value))
Upvotes: 65
Reputation: 2011
You could achieve this using the __post_init__
method:
import dataclasses
@dataclasses.dataclass
class Test:
value : int
def __post_init__(self):
self.value = int(self.value)
This method is called following the __init__
method
https://docs.python.org/3/library/dataclasses.html#post-init-processing
Upvotes: 17