Reputation: 5552
I have a multilang FastAPI connected to MongoDB. My document in MongoDB is duplicated in the two languages available and structured this way (simplified example):
{
"_id": xxxxxxx,
"en": {
"title": "Drinking Water Composition",
"description": "Drinking water composition expressed in... with pesticides.",
"category": "Water",
"tags": ["water","pesticides"]
},
"fr": {
"title": "Composition de l'eau de boisson",
"description": "Composition de l'eau de boisson exprimée en... présence de pesticides....",
"category": "Eau",
"tags": ["eau","pesticides"]
},
}
I therefore implemented two models DatasetFR
and DatasetEN
, each one makes references with specific external Models (Enum
) for category
and tags
in each lang.
class DatasetFR(BaseModel):
title:str
description: str
category: CategoryFR
tags: Optional[List[TagsFR]]
# same for DatasetEN chnaging the lang tag to EN
In the routes definition I forced the language parameter to declare the corresponding Model and get the corresponding validation.
@router.post("?lang=fr", response_description="Add a dataset")
async def create_dataset(request:Request, dataset: DatasetFR = Body(...), lang:str="fr"):
...
return JSONResponse(status_code=status.HTTP_201_CREATED, content=created_dataset)
@router.post("?lang=en", response_description="Add a dataset")
async def create_dataset(request:Request, dataset: DatasetEN = Body(...), lang:str="en"):
...
return JSONResponse(status_code=status.HTTP_201_CREATED, content=created_dataset)
But this seems to be in contradiction with the DRY principle. So, I wonder here if someone knows an elegant solution to: - given the lang
parameter, dynamically call the corresponding model.
Or, if we can create a Parent Model Dataset
that takes the lang
argument and retrieve the child model Dataset<LANG>
.
This would incredibly ease building my API routes and the call of my models and mathematically divide by two the writing.
Upvotes: 8
Views: 7985
Reputation: 34551
A solution would be the following: Define lang
as Query
paramter and add a regular expression that the parameter should match. In your case, that would be ^(fr|en)$
, meaning that only fr
or en
would be valid inputs. Thus, if no match was found, the request would stop there and the client would receive a "string does not match regex..." error.
Next, define the body
parameter as a generic type of dict
and declare it as Body
field; thus, instructing FastAPI to expect a JSON
body.
Following, create a dictionary of your models
that you can use to look up for a model using the lang
attribute. Once you find the corresponding model
, try
to parse the JSON
body using models[lang].parse_obj(body)
(equivalent to using models[lang](**body)
). If no ValidationError
is raised, you know the resulting model
instance is valid. Otherwise, return an HTTP_422_UNPROCESSABLE_ENTITY
error, including the errors, which you can handle as desired.
If you would also like FR
and EN
being valid lang
values, adjust the regex to ignore case using ^(?i)(fr|en)$
instead, and make sure to convert lang
to lower case when looking up for a model (i.e., models[lang.lower()].parse_obj(body)
).
import pydantic
from fastapi import FastAPI, Response, status, Body, Query
from fastapi.responses import JSONResponse
from fastapi.encoders import jsonable_encoder
models = {"fr": DatasetFR, "en": DatasetEN}
@router.post("/", response_description="Add a dataset")
async def create_dataset(body: dict = Body(...), lang: str = Query(..., regex="^(fr|en)$")):
try:
model = models[lang].parse_obj(body)
except pydantic.ValidationError as e:
return Response(content=e.json(), status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, media_type="application/json")
return JSONResponse(content=jsonable_encoder(dict(model)), status_code=status.HTTP_201_CREATED)
Since the two models have identical attributes (i.e., title
and description
), you could define a parent model (e.g., Dataset
) with those two attributes, and have DatasetFR
and DatasetEN
models inherit those.
class Dataset(BaseModel):
title:str
description: str
class DatasetFR(Dataset):
category: CategoryFR
tags: Optional[List[TagsFR]]
class DatasetEN(Dataset):
category: CategoryEN
tags: Optional[List[TagsEN]]
Additionally, it might be a better approach to move the logic from inside the route to a dependency function and have it return the model
, if it passes the validation; otherwise, raise an HTTPException
, as also demonstrated by @tiangolo. You can use jsonable_encoder
, which is internally used by FastAPI, to encode the validation errors()
(the same function can also be used when returning the JSONResponse
).
from fastapi.exceptions import HTTPException
from fastapi import Depends
models = {"fr": DatasetFR, "en": DatasetEN}
async def checker(body: dict = Body(...), lang: str = Query(..., regex="^(fr|en)$")):
try:
model = models[lang].parse_obj(body)
except pydantic.ValidationError as e:
raise HTTPException(detail=jsonable_encoder(e.errors()), status_code=status.HTTP_422_UNPROCESSABLE_ENTITY)
return model
@router.post("/", response_description="Add a dataset")
async def create_dataset(model: Dataset = Depends(checker)):
return JSONResponse(content=jsonable_encoder(dict(model)), status_code=status.HTTP_201_CREATED)
A further approach would be to have a single Pydantic model (let's say Dataset
) and customize the validators for category
and tags
fields. You can also define lang
as part of Dataset
, thus, no need to have it as query parameter. You can use a set
, as described here, to keep the values of each Enum
class, so that you can efficiently check if a value exists in the Enum
; and have dictionaries to quickly look up for a set
using the lang
attribute. In the case of tags
, to verify that every element in the list is valid, use set.issubset
, as described here. If an attribute is not valid, you can raise ValueError
, as shown in the documentation, "which will be caught and used to populate ValidationError
" (see "Note" section here). Again, if you need the lang
codes written in uppercase being valid inputs, adjust the regex
pattern, as described earlier.
P.S. You don't even need to use Enum
with this approach. Instead, populate each set
below with the permitted values. For instance,
categories_FR = {"Eau"} categories_EN = {"Water"} tags_FR = {"eau", "pesticides"} tags_EN = {"water", "pesticides"}
. Additionally, if you would like not to use regex, but rather have a custom validation error for lang
attribute as well, you could add it in the same validator
decorator and perform validation similar (and previous) to the other two fields.
from pydantic import validator
categories_FR = set(item.value for item in CategoryFR)
categories_EN = set(item.value for item in CategoryEN)
tags_FR = set(item.value for item in TagsFR)
tags_EN = set(item.value for item in TagsEN)
cats = {"fr": categories_FR, "en": categories_EN}
tags = {"fr": tags_FR, "en": tags_EN}
def raise_error(values):
raise ValueError(f'value is not a valid enumeration member; permitted: {values}')
class Dataset(BaseModel):
lang: str = Body(..., regex="^(fr|en)$")
title: str
description: str
category: str
tags: List[str]
@validator("category", "tags")
def validate_atts(cls, v, values, field):
lang = values.get('lang')
if lang:
if field.name == "category":
if v not in cats[lang]: raise_error(cats[lang])
elif field.name == "tags":
if not set(v).issubset(tags[lang]): raise_error(tags[lang])
return v
@router.post("/", response_description="Add a dataset")
async def create_dataset(model: Dataset):
return JSONResponse(content=jsonable_encoder(dict(model)), status_code=status.HTTP_201_CREATED)
Note that in Pydantic V2, @validator
has been deprecated and was replaced by @field_validator
. Please have a look at this answer for more details and examples.
Another approach would be to use Discriminated Unions, as described in this answer.
As per the documentation:
When
Union
is used with multiple submodels, you sometimes know exactly which submodel needs to be checked and validated and want to enforce this. To do that you can set the same field - let's call itmy_discriminator
- in each of the submodels with a discriminated value, which is one (or many)Literal
value(s). For yourUnion
, you can set the discriminator in its value:Field(discriminator='my_discriminator')
.Setting a discriminated union has many benefits:
- validation is faster since it is only attempted against one model
- only one explicit error is raised in case of failure
- the generated JSON schema implements the associated OpenAPI specification
Upvotes: 4
Reputation: 396
There are 2 parts to the answer (API call and data structure)
for the API call, you could separate them into 2 routes like /api/v1/fr/...
and /api/v1/en/...
(separating ressource representation!) and play with fastapi.APIRouter to declare the same route twice but changing for each route the validation schema by the one you want to use.
you could start by declaring a common BaseModel as an ABC as well as an ABCEnum.
from abc import ABC
from pydantic import BaseModel
class MyModelABC(ABC, BaseModel):
attribute1: MyEnumABC
class MyModelFr(MyModelABC):
attribute1: MyEnumFR
class MyModelEn(MyModelABC):
attribute1: MyEnumEn
Then you can select the accurate Model for the routes through a class factory:
my_class_factory: dict[str, MyModelABC] = {
"fr": MyModelFr,
"en": MyModelEn,
}
Finally you can create your routes through a route factory:
def generate_language_specific_router(language: str, ...) -> APIRouter:
router = APIRouter(prefix=language)
MySelectedModel: MyModelABC = my_class_factory[language]
@router.post("/")
def post_something(my_model_data: MySelectedModel):
# My internal logic
return router
About the second part (internal computation and data storage), internationalisation is often done through hashmaps.
The standard python library gettext could be investigated
Otherwise, the original language can be explicitely used as the key/hash and then map translations to it (also including the original language if you want to have consistency in your calls).
It can look like:
dictionnary_of_babel = {
"word1": {
"en": "word1",
"fr": "mot1",
},
"word2": {
"en": "word2",
},
"Drinking Water Composition": {
"en": "Drinking Water Composition",
"fr": "Composition de l'eau de boisson",
},
}
my_arbitrary_object = {
"attribute1": "word1",
"attribute2": "word2",
"attribute3": "Drinking Water Composition",
}
my_translated_object = {}
for attribute, english_sentence in my_arbitrary_object.items():
if "fr" in dictionnary_of_babel[english_sentence].keys():
my_translated_object[attribute] = dictionnary_of_babel[english_sentence]["fr"]
else:
my_translated_object[attribute] = dictionnary_of_babel[english_sentence]["en"] # ou sans "en"
expected_translated_object = {
"attribute1": "mot1",
"attribute2": "word2",
"attribute3": "Composition de l'eau de boisson",
}
assert expected_translated_object == my_translated_object
This code should run as is
A proposal for mongoDB representation, if we don't want to have a separate table for translations, could be a data structure
such as:
# normal:
my_attribute: "sentence"
# internationalized
my_attribute_internationalized: {
sentence: {
original_lang: "sentence"
lang1: "sentence_lang1",
lang2: "sentence_lang2",
}
}
A simple tactic to generalize string translation is to define an anonymous function _()
that embeds the translation like:
CURRENT_MODULE_LANG = "fr"
def _(original_string: str) -> str:
"""Switch from original_string to translation"""
return dictionnary_of_babel[original_string][CURRENT_MODULE_LANG]
Then call it everywhere a translation is needed:
>>> print(_("word 1"))
"mot 1"
You can find a reference to this practice in the django documentation about internationalization-in-python-code.
For static translation (for example a website or a documentation), you can use .po files and editors like poedit (See the french translation of python docs for a practical usecase)!
Upvotes: 4