Josip Pavičić
Josip Pavičić

Reputation: 13

How to correctly type a Pydantic model to handle string input for a list[float] with validation before initialization?

I'm using Pydantic to define a model where one of the fields, embedding, is expected to be a list[float]. However, I want to be able to pass a string to this field, and then have a validator transform this string into a list[float] before initialization.

Here's the code I'm working with:

from pydantic import BaseModel, field_validator
import uuid

class ChunkInsert(BaseModel):
    embedding: list[float]
    file_id: uuid.UUID

    @field_validator(
        "embedding",
        mode="before",
    )
    @classmethod
    def embed_files(cls, value: str) -> list[float]:
        return embed_text(value)[0]

chunk_in = ChunkInsert(
    embedding="a",
    file_id=uuid.UUID("987f5c8a-5577-4662-be1d-cb1ba016f6f5"),
)

The code works as expected, and embed_files processes the string and converts it into a list[float]. However, I'm getting the following type error in VS Code from Pylance:

Argument of type "Literal['a']" cannot be assigned to parameter "embedding" of type "list[float]" in function "init" "Literal['a']" is incompatible with "list[float]"PylancereportArgumentType

It seems like Pylance is not recognizing that the embedding field should be processed by the embed_files validator before the type check.

So my question is: is there a way to configure Pydantic or Pylance so that this kind of pre-initialization validation doesn't trigger a type error?

Edit: since pylance is a static type checker and I am dynamically changing the type before the model creation, is this even possible?

Upvotes: 1

Views: 391

Answers (1)

Nathan Chappell
Nathan Chappell

Reputation: 2446

Here is one solution that probably does what you want. Note that if you wanted to calculate the embedding only if accessed you could turn the property into a cached_property and calculate it there. I'm sort of assuming that you might want to pass in the embedding sometimes, so I've included that functionality in the solution...

from typing import Self
from pydantic import BaseModel, Field, computed_field, model_validator


class ChunkInsert(BaseModel):
    text: str
    embedding_: list[float] | None = Field(default=None, exclude=True, repr=False)

    @computed_field
    @property
    def embedding(self) -> list[float]:
        assert self.embedding_
        return self.embedding_

    @model_validator(mode="after")
    def embed_files(self) -> Self:
        if not self.embedding_:
            self.embedding_ = [1.0]
        return self


# all of these make the typechecker happy

print(ChunkInsert(text="foobar"))
print(ChunkInsert(text="moodbar", embedding_=[1.0]))
print(ChunkInsert(text="foobar").model_dump())
print(ChunkInsert(text="moobar", embedding_=[1.0]).model_dump())
print(ChunkInsert(text="moodbar", embedding_=[1.0]).embedding[0])

# output:
# text='foobar' embedding=[1.0]
# text='moodbar' embedding=[1.0]
# {'text': 'foobar', 'embedding': [1.0]}
# {'text': 'moobar', 'embedding': [1.0]}
# 1.0

For anyone who wants a slightly less correct but less verbose solution, the following idea would do the same:

from typing import Self
from pydantic import BaseModel, Field, computed_field, model_validator


class ChunkInsert(BaseModel):
    text: str
    embedding: list[float] = []

    @model_validator(mode="after")
    def embed_files(self) -> Self:
        if not self.embedding:
            self.embedding = [1.0]
        return self

# same happiness, same output

Upvotes: 0

Related Questions