ThaNoob
ThaNoob

Reputation: 642

how to prevent repeated computation of computed fields that depend on each other?

I am used to normal python classes, and I am trying to learn pydantic now. It's been a lot harder than I expected. What I often do is initiate a class with some initial input and based on that initial input I "calculate" a lot of attributes for that class. I can't get the creation of "calculated" attributes figured out in pydantic.

I created the following example to demonstrate the issue:

from pydantic import BaseModel, computed_field
from typing import List
class Person(BaseModel):
    first_name: str
    last_name: str

    @computed_field
    @property
    def composite_name(self) -> str:
        print("initializing composite_name")
        return f"{self.first_name} {self.last_name}"

    @computed_field
    @property
    def composite_name_list(self) -> List[str]:
        print("initializing name_list")
        return [f"{self.composite_name} {i}" for i in range(5)]

p = Person(first_name="John", last_name="Doe")
print(p.composite_name_list)

In the code above I would expect this code to run composite_name and create the composite_name attribute. Then I would expect it to run composite_name_list and create the composite_name_list attribute. It would thus go through each of this functions exactly once, and it would print once "intializing composite_name" and then "intializing name_list".

Instead, the print-out I get is:

initializing name_list
initializing composite_name
initializing composite_name
initializing composite_name
initializing composite_name
initializing composite_name
['John Doe 0', 'John Doe 1', 'John Doe 2', 'John Doe 3', 'John Doe 4']

A couple of odd things in this printout:

  1. The first thing printed is "intializing name_list" while the print statement of "initializing composite_name" comes first.
  2. It seems to recalculate the composite name attribute every time it is called, even though I used the computed field decorator.
  3. I added the last line "print(p.composite_name_list) because otherwise it wouldn't print anything at all! Or in other words, instantiating the class Person does not automatically seem to cause the creation of my two computed properties.

In standard python, I would have just created this class like this:

class PersonStandardPython:
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name
        self.composite_name = f"{first_name} {last_name}"
        self.composite_name_list = [f"{self.composite_name} {i}" for i in range(5)]

How can I get to a similar result as my standard python implementation while still having the benefit of pydantics strong typing?

Upvotes: 2

Views: 1248

Answers (2)

Axel Donath
Axel Donath

Reputation: 1648

Intro

I think there are some misunderstandings on how the computed_field works and how it is meant to be used. computed_field acts very much like a property in Python, that's why it also uses the property decorator in addition. It mimics the appearance of an attribute, while computing its value "on request" only (see Python docs). The computed_field decorator then only add this property to the list of valid fields to the Pydantic model and thus it can be used for e.g. serialization.

Mutability

In general computed fields / properties can be used to re-compute another value based of mutable attributes. In case of your first example one could modify first_name or last_name and composite_name would still return the correct name, for example:

p = Person(first_name="John", last_name="Doe")
print(p.composite_name)

p.first_name = "Jane"
print(p.composite_name)

Which should print :

John Doe
Jane Doe

In contrast, in your second example, if you modified first_name, composite_name would still be set to the value it has been assigned on init, like so:

p = PersonStandardPython(first_name="John", last_name="Doe")
print(p.composite_name)

p.first_name = "Jane"
print(p.composite_name)

Which should print:

John Doe
John Doe

So both case exhibit totally different behaviors with regards to mutability. If you want your Person object to be mutable, your first example is entirely correct! You just have to look at it again and understand its behavior. So let me address the three points you mentioned:

The first thing printed is "intializing name_list" while the print statement of "initializing composite_name" comes first.

This is entirely expected. As the compute_field works like a property, it executes the code defined in composite_name_list first, before the other property composite_name is accessed.

It seems to recalculate the composite name attribute every time it is called, even though I used the computed field decorator.

Again this is entirely expected. As it works just like a property it re-executes the code defined in the method. However you can cache the result of the computed property (more on this later).

I added the last line "print(p.composite_name_list) because otherwise it wouldn't print anything at all! Or in other words, instantiating the class Person does not automatically seem to cause the creation of my two computed properties.

This also expected, because the code is only executed on access of the computed field. It is "delayed" and not computed on initialization of the object.

Faux Immutability

Alternatively with Pydantic you can achieve "faux immutability" (see faux immutability docs). This way you can compute the derived attributes on init or before and prevent that the attributes it is based off are modified later. For this you can use frozen=True in the class definition and for example a model_validator:

from pydantic import BaseModel, model_validator
from typing import List, Optional


class Person(BaseModel, frozen=True):
    first_name: str
    last_name: str
    composite_name: Optional[str] = None
    composite_name_list: Optional[List[str]] = None

    @model_validator(mode="before")
    @classmethod
    def init_derived_attribute(cls, data, info):
        first_name = data.get("first_name")
        last_name = data.get("last_name")
        composite_name = f"{first_name} {last_name}"

        data["composite_name"] = composite_name
        data["composite_name_list"] = [f"{composite_name} {i}" for i in range(5)]
        return data

p = Person(first_name="John", last_name="Doe")
print(p.composite_name)
p.first_name = "Jane" # this now raises an error!

Proposed Solution

While the example above works fine, I think it is not the cleanest solution. You mention you would mostly like to avoid the re-computation of the field. The solution for this is simple. You can use a cached_property from the standard functools library. However in this case you should still combine it with faux immutability to make sure the object cannot be modified in memory and the derived property goes out of sync. Here is the final code I would propose:

from pydantic import BaseModel, computed_field
from typing import List
from functools import cached_property

class Person(BaseModel, frozen=True):
    first_name: str
    last_name: str

    @computed_field
    @cached_property
    def composite_name(self) -> str:
        print("initializing composite_name")
        return f"{self.first_name} {self.last_name}"

    @computed_field
    @cached_property
    def composite_name_list(self) -> List[str]:
        print("initializing name_list")
        return [f"{self.composite_name} {i}" for i in range(5)]


p = Person(first_name="John", last_name="Doe")
print(p.composite_name_list)

Which prints:

initializing name_list
initializing composite_name
['John Doe 0', 'John Doe 1', 'John Doe 2', 'John Doe 3', 'John Doe 4']

While it keeps the execution order the same (see above, this is expected), it avoids the re-computation of composite_name and only prints it once. For a all subsequent access it is cached. One important note here is that typically it is only reasonable to use cache_property if the computation is rather "expensive". If you really just concatenate two string, doing it repeatedly might just be fine.

Summary

If you intend your Person class to be mutable your first proposed solution is just fine! The re-computation ensures, that the derived fields / properties are always "up to date" with the other attributes it is derived of. Alternatively you can change to one of the solutions with "faux immutability" I proposed above, which both avoid the re-computation.

Upvotes: 5

ThaNoob
ThaNoob

Reputation: 642

I don't know if this is the best method but it works. You initiate the variables as "None" and then you use the model_post_init() function to alter them immediately after the init() function is called (which is called under the hood by pydantic).

from pydantic import BaseModel, Field
from typing import List, Any


class Person(BaseModel):
    first_name: str
    last_name: str
    composite_name: str = Field(default=None)  # Pre-declared fields, initialized to None
    composite_name_list: List[str] = Field(default=None)

def model_post_init(self, __context: Any) -> None:
    self.composite_name = f"{self.first_name} {self.last_name}"
    self.composite_name_list = [f"{self.composite_name} {i}" for i in range(5)]

a = Person(first_name="John", last_name="Doe")
print(a.composite_name_list)

Upvotes: 1

Related Questions