Reputation: 23
I've written a python application in a 'broadly' functional way, using frozen dataclasses as the inputs and outputs of functions. These dataclasses typically hold a dataframe, and perhaps another attribute, for example:
@dataclass(frozen=True)
class TimeSeries:
log: pd.DataFrame
sourceName: str
I now have more possible data objects, which follow an 'as-a' inheritance structure. So perhaps a TimeSeries
has DataFrame with columns only Time
and A
, and a ExtendedTimeSeries
has one with these columns and also a B
column, and so on. I now have 4 different TimeSeries which in an OO paradigm would fall into a hierarchy.
What is the best structure for this?
I could use (OO style) composition rather than inheritance, and have the ExtendedTimeSeries data structure contain a TimeSeries object and a standalone Temperature series, but that doesn't seem to be efficient (have to merge before doing df operations) or safe (possibility of mismatched rows).
Without the DataFrames this compositional approach would seem to work ok. Any good design tips?
I could have a series of dataclasses inheriting from each other, but they would have exactly the same variables (in the example above log
and sourceName
), and I'm not sure that is possible/sensible.
Upvotes: 2
Views: 125
Reputation: 3641
In this scenario I would discriminate the cases with a src_type
attribute, which then can be used to identify the type of data. This src_type could be automatically determined in a __post_init__
method (circumventing the frozen status) and then used in the functional evaluation.
from enum import Enum
from dataclasses import dataclass
import pandas as pd
# predefined source types for easier discrimination
class SrcType(Enum):
STANDARD = 0
EXTENDED = 1
@dataclass(frozen=True)
class TimeSeries:
log: pd.DataFrame
src_name: str
src_type: SrcType = None
def __post_init__(self):
# criteria for various source types
if 'B' in self.log.columns:
src_type = SrcType.EXTENDED
else:
src_type = SrcType.STANDARD
# bypassing the frozen attribute
object.__setattr__(self, 'src_type', src_type)
series = TimeSeries(pd.DataFrame(), "my_src")
print(series.src_type) # <- STANDARD
series = TimeSeries(pd.DataFrame({'B': [0]}), "my_src")
print(series.src_type) # <- EXTENDED
Upvotes: 1