Reputation: 253
I have the following class
@dataclass_json
@dataclass
class Source:
type: str =None
label: str =None
path: str = None
and the two subclasses:
@dataclass_json
@dataclass
class Csv(Source):
csv_path: str=None
delimiter: str=';'
and
@dataclass_json
@dataclass
class Parquet(Source):
parquet_path: str=None
Given now the dictionary:
parquet={type: 'Parquet', label: 'events', path: '/.../test.parquet', parquet_path: '../../result.parquet'}
csv={type: 'Csv', label: 'events', path: '/.../test.csv', csv_path: '../../result.csv', delimiter:','}
Now I would like to do something like
Source().from_dict(csv)
and that the output will be the class Csv or Parquet. I understand that if you initiate the class source you just "upload" the parameters with the method "from dict", but is there any posibility in doing this by some type of inheritence without using a "Constructor" which makes a if-else if-else over all possible 'types'?
Pureconfig, a Scala Library, creates different case classes when the attribute 'type' has the name of the desired subclass. In Python this is possible?
Upvotes: 4
Views: 2253
Reputation: 703
Do you need this behavior?
from dataclasses import dataclass
from typing import Optional, Union, List
from validated_dc import ValidatedDC
@dataclass
class Source(ValidatedDC):
label: Optional[str] = None
path: Optional[str] = None
@dataclass
class Csv(Source):
csv_path: Optional[str] = None
delimiter: str = ';'
@dataclass
class Parquet(Source):
parquet_path: Optional[str] = None
@dataclass
class InputData(ValidatedDC):
data: List[Union[Parquet, Csv]]
# Let's say you got a json-string and loaded it:
data = [
{
'label': 'events', 'path': '/.../test.parquet',
'parquet_path': '../../result.parquet'
},
{
'label': 'events', 'path': '/.../test.csv',
'csv_path': '../../result.csv', 'delimiter': ','
}
]
input_data = InputData(data=data)
for item in input_data.data:
print(item)
# Parquet(label='events', path='/.../test.parquet', parquet_path='../../result.parquet')
# Csv(label='events', path='/.../test.csv', csv_path='../../result.csv', delimiter=',')
validated_dc: https://github.com/EvgeniyBurdin/validated_dc
Upvotes: 0
Reputation: 542
This is a variation on my answer to this question.
@dataclass_json
@dataclass
class Source:
type: str = None
label: str = None
path: str = None
def __new__(cls, type=None, **kwargs):
for subclass in cls.__subclasses__():
if subclass.__name__ == type:
break
else:
subclass = cls
instance = super(Source, subclass).__new__(subclass)
return instance
assert type(Source(**csv)) == Csv
assert type(Source(**parquet)) == Parquet
assert Csv(**csv) == Source(**csv)
assert Parquet(**parquet) == Source(**parquet)
You asked and I am happy to oblige. However, I'm questioning whether this is really what you need. I think it might be overkill for your situation. I originally figured this trick out so I could instantiate directly from data when...
If those conditions apply to your situation, then I think this is a worth-while approach. If not, the added complexity of mucking with __new__
-- a moderately advanced maneuver -- might not outweigh the savings in complexity in the code used to instantiate. There are probably simpler alternatives.
For example, it appears as though you already know which subclass you need; it's one of the fields in the data. If you put it there, presumably whatever logic you wrote to do so could be used to instantiate the appropriate subclass right then and there, bypassing the need for my solution. Alternatively, instead of storing the name of the subclass as a string, store the subclass itself. Then you could do this: data['type'](**data)
It also occurs to me that maybe you don't need inheritance at all. Do Csv
and Parquet
store the same type of data, differing only in which file format they read it from? Then maybe you just need one class with from_csv
and from_parquet
methods. Alternatively, if one of the parameters is a filename, it would be easy to figure out which type of file parsing you need based on the filename extension. Normally I'd put this in __init__
, but since you're using dataclass
, I guess this would happen in __post_init__
.
Upvotes: 3
Reputation: 50076
You can build a helper that picks and instantiates the appropriate subclass.
def from_data(data: dict, tp: type):
"""Create the subtype of ``tp`` for the given ``data``"""
subtype = [
stp for stp in tp.__subclasses__() # look through all subclasses...
if stp.__name__ == data['type'] # ...and select by type name
][0]
return subtype(**data) # instantiate the subtype
This can be called with your data and the base class from which to select:
>>> from_data(
... {'type': 'Csv', 'label': 'events', 'path': '/.../test.csv', 'csv_path': '../../result.csv', 'delimiter':','},
... Source,
... )
Csv(type='Csv', label='events', path='/.../test.csv', csv_path='../../result.csv', delimiter=',')
If you need to run this often, it is worth building a dict
to optimise the subtype lookup. A simple means is to add a method to your base class, and store the lookup there:
@dataclass_json
@dataclass
class Source:
type: str =None
label: str =None
path: str = None
@classmethod
def from_data(cls, data: dict):
if not hasattr(cls, '_lookup'):
cls._lookup = {stp.__name__: stp for stp in cls.__subclasses__()}
return cls._lookup[data["type"]](**data)
This can be called directly on the base class:
>>> Source.from_data({'type': 'Csv', 'label': 'events', 'path': '/.../test.csv', 'csv_path': '../../result.csv', 'delimiter':','})
Csv(type='Csv', label='events', path='/.../test.csv', csv_path='../../result.csv', delimiter=',')
Upvotes: 3