Reputation: 907
In Pydantic 2, fields of type set
are already JSON-serialized to lists. However, these lists are unordered. Or, more specifically, their items are ordered according to internal ordering of the original set.
Unfortunately, even when two sets contain the same items, their internal ordering might still be different. Consequently, serializing sets without explicitly ordering them produces nondeterministic results.
I am looking for a way to configure a particular Pydantic 2 model to JSON-serialize all of its fields whose type is set
to a sorted list first, before converting the outcome to string. I would like to avoid defining custom set type or adding a custom type annotation for every such attribute. The solution should be more generic because the number of attributes which might need this kind of handling is larger. Moreover, they might also be defined in subclasses of that particular model.
Is there a reasonably simple way to achieve this?
It seems to me that using a model serializer is the most straightforward way to do it. But at the same time it seems cumbersome to me to loop through all the attributes, check their type, call the serialization routines for the items and then sort the results into a list. If possible, I would like to avoid that and leverage Pydantic's knowledge about the attribute types in some way.
In Pydantic 1, the desired effect could be achieved by using the json_encoders
parameter of the configuration and defining a custom serialization function for all attributes of type set
. However, in Pydantic 2 this option has been removed due to "performance overhead and implementation complexity". It seems understandable.
I am not necessarily looking for a way to mimic the behavior of Pydantic 1. If there is a way to achieve similar effect using primary or recommended Pydantic 2 features, I would prefer to use it.
It seems that at some point of JSON serialization, Pydantic is converting the sets to lists anyway. In an ideal case, I would like to somehow tap into this conversion and merely call sorted
on its outcome.
Upvotes: 3
Views: 2144
Reputation: 1864
Another approach I see is probably more cumbersome than what you hoped for and what you proposed with the model_serializer
, but it only targets explicity selected attributes:
Serializing a set
as a sorted list pydantic 2 (2.6 to be precise) can be done with a @field_serializer
decorator (Source: pydantic documentation > functional serializers).
Here is the example given in the referenced documentation:
from typing import Set
from pydantic import BaseModel, field_serializer
class StudentModel(BaseModel):
name: str = 'Jane'
courses: Set[str]
@field_serializer('courses', when_used='json')
def serialize_courses_in_order(courses: Set[str]):
return sorted(courses)
student = StudentModel(courses={'Math', 'Chemistry', 'English'})
print(student.model_dump_json())
#> {"name":"Jane","courses":["Chemistry","English","Math"]}
The attribute courses
is serialized as a sorted list.
You can also apply a field_serializier
to multiple attributes:
class AnotherModel(BaseModel):
a: Set[str]
b: Set[str]
@field_serializer('a', 'b', when_used='json')
def serialize_sets(set_of_str: Set[str]):
return sorted(set_of_str)
Upvotes: 1