mjohnson777
mjohnson777

Reputation: 71

How to transform data for Pydantic Models?

I'm trying to figure out how to validate and transform data within a Pydantic model. I'm retrieving data from an api on jobs (dummy example below) and need to map the fields to a Pydantic model. The json is converted to a Python dictionary first.

The problem is that the keys in the dictionary are different from the names of the model fields. I also need to apply some transformations to get the data in the format I want. In all the examples I found in the Pydantic documentation where a model is created from a different object (such as an ORM object), the fields are identical in name.

Like here (orm-mode).

Originally, I used a helper function to map the dictionary keys to the model fields, but I was asked to perform the transformation when instantiating the model. I can't figure out how to pass the dictionary (or its keys) as an argument to the model, map the keys to the model fields, and validate the data.

from pydantic import BaseModel, condecimal
from datetime import datetime
from typing import Optional


# example data
job_data = {
  'id': 10,
  'address': {
    'city': 'Brooklyn', 
    'state': 'New York'
  }, 
  'durationWeeks': 1.0, 
  'hoursPerWeek': 34.0, 
  'publicDescription': 'This is a public description', 
  'salary': 0.0, 
  'startDate': 1640581200000, 
  'title': 'Physician'
}

class Location(BaseModel):
  city: str
  state: str

class JobModel(BaseModel):
  id: int  # I'm aware this is a built-in function in Python
  location: Location
  duration: Optional[int] = None
  hours_per_week: Optional[int] = None
  description: str
  pay: Optional[condecimal(ge=0, decimal_places=2)] = None,
  start_date: datetime
  title: str

# Mapping:
# id -> id
# location -> address
# duration -> durationWeeks
# hours_per_week -> hoursPerWeek
# description -> publicDescription
# pay -> salary
# start_date -> from_timestamp(startDate)  # function to convert to a date object
# title -> title

What can I do? Defining __init__ and passing the object as JobModel(job_data) raises an exception because the arguments don't match. And if I just set __init__(job_data: Dict) I can't set the fields I need. Will I need to map the job data as I declare an instance of the model?

I know there's also Field(alias='new_name') (example here) that I could use as a workaround, but is this the best approach?

Then there's this post showing examples with GetterDict and from_orm.

Which approach requires the least code?

Upvotes: 4

Views: 23198

Answers (1)

Paul P
Paul P

Reputation: 3907

The best approach probably depends on what exactly you are trying to achieve.

Since you wrote that you are getting data from an API, I would assume that you are ingesting them as JSON.

I don't know of any downsides to using aliases in a case like that. Just be aware of the different ways you can alias fields, and that they have a precedence (as described e.g. here).

Regarding the conversion from timestamp to datetime: According to Pydantic's docs:

pydantic will process either a unix timestamp int (e.g. 1496498400) or a string representing the date & time.

In other words, when you use a datetime as the field type, "I should just work".

Below is how I would approach it with data coming from an API (as JSON). I dropped location and description as they are not in the job_data, and added the Config with field alias definitions:

from pydantic import BaseModel, condecimal
from datetime import datetime
from typing import Optional

job_data = {
  'id': 10,
  'address': {
    'city': 'Brooklyn', 
    'state': 'New York'
  }, 
  'durationWeeks': 1.0, 
  'hoursPerWeek': 34.0, 
  'publicDescription': 'This is a public description', 
  'salary': 0.0, 
  'startDate': 1640581200000, 
  'title': 'Physician'
}

class JobModel(BaseModel):
  id: int
  duration: Optional[int] = None
  hours_per_week: Optional[int] = None
  pay: Optional[condecimal(ge=0, decimal_places=2)] = None,
  start_date: datetime
  title: str

  class Config:
      fields = {'start_date': 'startDate'}

## parse_obj is DEPRECATED
# jd = JobModel.parse_obj(job_data)
jd = JobModel.model_validate(job_data)

print(jd)
# id=10 
# duration=None 
# hours_per_week=None 
# pay=(None,) 
# start_date=datetime.datetime(2021, 12, 27, 5, 0, tzinfo=datetime.timezone.utc)
# title='Physician'

The docs:

The behavior is the same as the deprecated parse_obj.

parse_obj: this is very similar to the __init__ method of the model, except it takes a dict rather than keyword arguments. If the object passed is not a dict a ValidationError will be raised.

There is also a model_validate_json for use as well.

This should make it a bit safer and you could add a try-except block (catching ValidationError) around it, if you wanted to make it fool-proof.

However, you might as well initialise the model directly, the result will be the same:

jd = JobModel(**job_data)

print(jd)
# id=10 
# duration=None 
# hours_per_week=None
# pay=(None,) 
# start_date=datetime.datetime(2021, 12, 27, 5, 0, tzinfo=datetime.timezone.utc)
# title='Physician'

Upvotes: 3

Related Questions