Marc Carré
Marc Carré

Reputation: 1454

Validate a recursive data structure (e.g. tree) using Python Cerberus (v1.3.5)

What is the right way to model a recursive data structure's schema in Cerberus?

Attempt #1:

from cerberus import Validator, schema_registry
schema_registry.add("leaf", {"value": {"type": "integer", "required": True}})
schema_registry.add("tree", {"type": "dict", "anyof_schema": ["leaf", "tree"]})
v = Validator(schema = {"root": {"type": "dict", "schema": "tree"}})

Error:

cerberus.schema.SchemaError: {'root': [{
    'schema': [
        'no definitions validate', {
            'anyof definition 0': [{
                'anyof_schema': ['must be of dict type'], 
                'type': ['null value not allowed'],
            }],
            'anyof definition 1': [
                'Rules set definition tree not found.'
            ],
        },
    ]},
]}

Attempt #2:

The above error indicating the need for a rules set definition for tree:

from cerberus import Validator, schema_registry, rules_set_registry
schema_registry.add("leaf", {"value": {"type": "integer", "required": True}})
rules_set_registry.add("tree", {"type": "dict", "anyof_schema": ["leaf", "tree"]})
v = Validator(schema = {"root": {"type": "dict", "schema": "tree"}})

v.validate({"root": {"value": 1}})
v.errors
v.validate({"root": {"a": {"value": 1}}})
v.errors
v.validate({"root": {"a": {"b": {"c": {"value": 1}}}}})
v.errors

Output:

False
{'root': ['must be of dict type']}

for all 3 examples.

Expected behaviour

Ideally, I would like all the below documents to pass validation:

v = Validator(schema = {"root": {"type": "dict", "schema": "tree"}})
assert v.validate({"root": {"value": 1}}), v.errors
assert v.validate({"root": {"a": {"value": 1}}}), v.errors
assert v.validate({"root": {"a": {"b": {"c": {"value": 1}}}}}), v.errors

Related questions

Upvotes: 1

Views: 84

Answers (1)

Marc Carré
Marc Carré

Reputation: 1454

WARNING

The below is not a complete solution.
If someone has a full working solution with cerberus, please share it, and I will happily mark your answer as the solution.

Additional constraint from my actual problem

The tree's leaves contain some keys that must match another part of the document I am validating. For this reason, I have an additional is_in validation method in my custom Validator. However, I couldn't find a good way to have a child validator for the leaves, while still keeping a reference to another part of the document at the root.

Observation

I have now spent more time "fighting" cerberus than it would have taken me to implement a custom input validation function, hence may try that instead for now, or try jsonschema. (EDIT: see attempt #4 below.)

Attempt #3: cerberus custom validator

Hopefully, the below logic can still be useful to someone.

from cerberus import Validator
from typing import Any


class ManifestValidator(Validator):
    def _validate_type_tree(self: Validator, value: Any) -> bool:
        if not isinstance(value, dict):
            return False
        for v in value.values():
            if isinstance(v, dict):
                if all(key in v for key in KEYS):
                    schema = self._resolve_schema(SCHEMA)
                    validator = self._get_child_validator(
                        document_crumb=v,
                        schema_crumb=(v, "schema"),
                        root_document=self.root_document,
                        root_schema=self.root_schema,
                        schema=schema,
                    )
                    if not validator(v, update=self.update) or validator._errors:
                        self._error(validator._errors)
                        return False
                elif not self._validate_type_tree(v):
                    return False
            else:
                return False
        return True

    def _validate_is_in(self: Validator, path: str, field: str, value: str) -> bool:
        """{'type': 'string'}"""
        document = self.root_document
        for element in path.split("."):
            if element not in document:
                self._error(field, f"{path} does not exist in {document}")
                return False
            document = document[element]
        if not isinstance(document, list):
            self._error(
                field,
                f"{path} does not point to a list but to {document} of type {type(document)}",
            )
            return False
        if value not in document:
            self._error(field, f"{value} is not present in {document} at {path}.")
            return False
        return True

Attempt #4: jsonschema + custom validation logic

from jsonschema import validate


SCHEMA = {
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type" : "object",
    "properties" : {
        "root": {
            "oneOf": [
                {"$ref": "#/$defs/tree",}, 
                {"$ref": "#/$defs/leaf",},
            ],
        },
    },
    "required": [
        "root",
    ],
    "$defs": {
        "tree": {
            "type": "object",
            "patternProperties": {
                "^[a-z]+([_-][a-z]+)*$": {
                    "oneOf": [
                        {"$ref": "#/$defs/tree",}, 
                        {"$ref": "#/$defs/leaf",},
                    ],
                },
            },
            "additionalProperties": False,
        },
        "leaf": {
            "type": "object",
            "properties": {
                # In reality, the leaf is a more complex object, but as a reduction of my problem:
                "value": {
                    "type": "number",
                },
            },
            "required": [
                "value",
            ],
        },
    },
}


TREES = [
    {"root": {"value": 1}},
    {"root": {"a": {"value": 1}}},
    {"root": {"a": {"b": {"c": {"value": 1}}}}},
    {"root": {"a-subtree": {"b-subtree": {"c-subtree": {"value": 1}}}}},
]


for tree in TREES:
    validate(tree, SCHEMA)

For my additional constraint (is_in), JSON pointers / JSON relative pointers / $data seem like they could be useful in simpler cases, but for what I needed, I decided to implement custom validation logic, after the jsonschema validation, which was a good first step to prove that the document is well-formed.

Resources:

Upvotes: 0

Related Questions