b0bu
b0bu

Reputation: 1230

python dataclasses as oop abstractions

I have a fairly basic question about dataclasses. If I have an event dict that I pass as data to a dataclass is it a good use of dataclasses in general to use the class to parse out the data I need? Or use it to handle conditionals in order to return the right data based on the data that was passed in.

@dataclass
class Event:
    data: dict[str,str]

    def type(self):
        return self.data["detail"]["eventName"]

I've just started using dataclasses and looking at how crap my code usually is because it's written in a rush with little thought to open-close extensibility or abstraction. So I'm trying to get my head around compositional root and when and where things should be coupled and when they shouldn't. I.e. as opposed to:

@dataclass
class Event:
    type: str

Event(e["detail"]["eventName"])

It makes a lot of sense to me to have something like

@dataclass
class Event:
    type: str
    name: str
    id: int

But what if the path into the dict to access to the id changes or more relevently what if the path is different depending on the type, which actually is one of the problems I've run into. If you create class EventType1 class EventType2 I'd still need to try event type 1 to see if the constructor will work since the paths might work and then move onto 2. Seems like I'm missing something and I'm replacing bad design with bad design since you'd need a class for every possible event type. What's people's thoughts? Is this a bad use of dataclasses? Should all the indexing be pulled out of the class and done somewhere else?

EDIT ---

I've decided to add a more concrete example of my issue. Where I've used EventData to abstract the concept of event formats, to where data1 and data2 become irrelevant as I'm actually consuming EventData. I still have to know based on the event format coming in which EventType constructor in this toy example GitPush or CreatePullRequest to use. As I could extend this interface to many events and the code consuming EventData doesn't care.

class Event(ABC):
    """ implement me """

    @abstractmethod
    def type(self) -> str:
        pass

    @abstractmethod
    def repository(self) -> str:
        pass


# event type 1
class GitPush(Event):
    """ implemented supported event type """
    def __init__(self, data):
        self.data = data

    @property
    def type(self) -> str:
        return self.data["detail"]["eventName"]

    @property
    def repository(self) -> str:
        return self.data["detail"]["additionalEventData"]["repositoryName"]

# event type 2
class PullRequest(Event):
    """ implemented supported event type """
    def __init__(self, data):
        self.data = data

    @property
    def type(self) -> str:
        return self.data["detail"]["eventName"]

    @property
    def repository(self) -> str:
        return self.data["detail"]["requestParameters"]["targets"][0]["repositoryName"]

@dataclass
class EventData:
    data: Event

    @property
    def type(self):
        return self.data.type

    @property
    def repository(self):
        return self.data.repository



event = GitPush(data1)

data = EventData(event)

print(data.repository)

event = PullRequest(data2)

data = EventData(event)

print(data.repository)

I am and was confused about the idea of EventType and whether it can or should be abstracted or if this is just a point of extensibility where new events can just meet the implementation requirements.

event = EventType(some_event)

data = EventData(event)

print(data.repository)

I can't think of any other way to do it other than using a conditional:

if some_event["detail"]["eventName"] == "Type1":
     e = Type1(some_event)
if some_event["detail"]["eventName"] == "Type2":
     e = Type2(some_event)

data = EventData(e)

print(data.repository)

Upvotes: 2

Views: 1237

Answers (1)

Silvio Mayolo
Silvio Mayolo

Reputation: 70277

I'll share with you a pattern that I've used a lot in this exact situation. The problem is that we've got some untrusted data in a JSON-like format and we want to store it in nice data structures. It's a good instinct, to get the messy business out of the way early and be able to assume, for the rest of the program, that the shape of the data is good (see Parse, don't Validate, which is an excellent article on the topic).

Here's what I've done in the past.

@dataclass
class Event:
    type: str
    name: str
    id: int

    @classmethod
    def from_json(data):
          return Event(
              type=data["detail"]["eventName"],
              name=data["name"],
              id=data["id"],
          )

You've got a dataclass. It's a real, genuine @dataclass and anyone who feels so inclined can construct instances of it directly. But the intended entrypoint is the factory function from_json, which takes your dictionary and parses it into an Event object. If the data format changes, you only need to change that one function.

And if there's any validation you should do (maybe IDs have to be nonnegative or something, or names have a maximum length), then you can do that in from_json as well and throw an exception on bad input. Obviously you'd want to document this behavior, but it can still be checked in this one place, at the endpoint of your application that touches whatever API you're talking to.

If you've got multiple different event types, then you can have a parent (abstract) class Event and all of its (concrete) subclasses, and the parent class can provide a @classmethod which constructs an instance of the appropriate subclass based on the shape of the data. Still one entrypoint.

Upvotes: 3

Related Questions