Reputation: 71
I'm trying to figure out how to validate and transform data within a Pydantic model. I'm retrieving data from an api on jobs (dummy example below) and need to map the fields to a Pydantic model. The json is converted to a Python dictionary first.
The problem is that the keys in the dictionary are different from the names of the model fields. I also need to apply some transformations to get the data in the format I want. In all the examples I found in the Pydantic documentation where a model is created from a different object (such as an ORM
object), the fields are identical in name.
Like here (orm-mode
).
Originally, I used a helper function to map the dictionary keys to the model fields, but I was asked to perform the transformation when instantiating the model. I can't figure out how to pass the dictionary (or its keys) as an argument to the model, map the keys to the model fields, and validate the data.
from pydantic import BaseModel, condecimal
from datetime import datetime
from typing import Optional
# example data
job_data = {
'id': 10,
'address': {
'city': 'Brooklyn',
'state': 'New York'
},
'durationWeeks': 1.0,
'hoursPerWeek': 34.0,
'publicDescription': 'This is a public description',
'salary': 0.0,
'startDate': 1640581200000,
'title': 'Physician'
}
class Location(BaseModel):
city: str
state: str
class JobModel(BaseModel):
id: int # I'm aware this is a built-in function in Python
location: Location
duration: Optional[int] = None
hours_per_week: Optional[int] = None
description: str
pay: Optional[condecimal(ge=0, decimal_places=2)] = None,
start_date: datetime
title: str
# Mapping:
# id -> id
# location -> address
# duration -> durationWeeks
# hours_per_week -> hoursPerWeek
# description -> publicDescription
# pay -> salary
# start_date -> from_timestamp(startDate) # function to convert to a date object
# title -> title
What can I do? Defining __init__
and passing the object as JobModel(job_data)
raises an exception because the arguments don't match. And if I just set __init__(job_data: Dict)
I can't set the fields I need. Will I need to map the job data as I declare an instance of the model?
I know there's also Field(alias='new_name')
(example here) that I could use as a workaround, but is this the best approach?
Then there's this post showing examples with GetterDict
and from_orm
.
Which approach requires the least code?
Upvotes: 4
Views: 23198
Reputation: 3907
The best approach probably depends on what exactly you are trying to achieve.
Since you wrote that you are getting data from an API, I would assume that you are ingesting them as JSON.
I don't know of any downsides to using aliases in a case like that. Just be aware of the different ways you can alias fields, and that they have a precedence (as described e.g. here).
Regarding the conversion from timestamp to datetime: According to Pydantic's docs:
pydantic will process either a unix timestamp int (e.g. 1496498400) or a string representing the date & time.
In other words, when you use a datetime
as the field type, "I should just work".
Below is how I would approach it with data coming from an API (as JSON). I dropped location
and description
as they are not in the job_data
, and added the Config
with field alias definitions:
from pydantic import BaseModel, condecimal
from datetime import datetime
from typing import Optional
job_data = {
'id': 10,
'address': {
'city': 'Brooklyn',
'state': 'New York'
},
'durationWeeks': 1.0,
'hoursPerWeek': 34.0,
'publicDescription': 'This is a public description',
'salary': 0.0,
'startDate': 1640581200000,
'title': 'Physician'
}
class JobModel(BaseModel):
id: int
duration: Optional[int] = None
hours_per_week: Optional[int] = None
pay: Optional[condecimal(ge=0, decimal_places=2)] = None,
start_date: datetime
title: str
class Config:
fields = {'start_date': 'startDate'}
## parse_obj is DEPRECATED
# jd = JobModel.parse_obj(job_data)
jd = JobModel.model_validate(job_data)
print(jd)
# id=10
# duration=None
# hours_per_week=None
# pay=(None,)
# start_date=datetime.datetime(2021, 12, 27, 5, 0, tzinfo=datetime.timezone.utc)
# title='Physician'
The docs:
The behavior is the same as the deprecated parse_obj
.
parse_obj
: this is very similar to the__init__
method of the model, except it takes a dict rather than keyword arguments. If the object passed is not a dict aValidationError
will be raised.
There is also a model_validate_json
for use as well.
This should make it a bit safer and you could add a try
-except
block (catching ValidationError
) around it, if you wanted to make it fool-proof.
However, you might as well initialise the model directly, the result will be the same:
jd = JobModel(**job_data)
print(jd)
# id=10
# duration=None
# hours_per_week=None
# pay=(None,)
# start_date=datetime.datetime(2021, 12, 27, 5, 0, tzinfo=datetime.timezone.utc)
# title='Physician'
Upvotes: 3