RoyalSwish
RoyalSwish

Reputation: 1573

Validate JSON Schema which has fixed keys and user defined keys in Python

I'm trying to validate a JSON file that is provided by a user. The JSON will contain certain fixed keys, but also contain some user-defined keys too. I want to validate that this JSON object contains these fixed keys, in a certain format, and the user-defined keys are in a certain format too (as these keys will always have values in a defined format).

I came across this post Validate JSON data using python, but the documentation for jsonschema.validate doesn't really show anything to do with user-defined keys, and also how to define if a key should have a list of dicts, or a dict which its key-values must be of a list of dicts.

Here's a sample schema:

{
    "a": "some value",
    "b": "some value",
    "c": {
        "custom_a": [{...}],
        "custom_b": [{...}]
    },
    "d": [{...}]
}

I have tried doing the following:

import json
from jsonschema import validate

my_json = json.loads(<JSON String following above pattern>) 

schema = {
    "a" : {"type": "string"},
    "b" : {"type": "string"},
    "c" : {[{}]},
    "d": [{}]
}

validate(instance=my_json, schema=schema) #raises TypeError on "c" and "d" in schema spec

I have also tried the following schema spec, but I get stuck on how to handle the custom keys, and also nested lists within dicts, etc.

schema = {
    "a" : {"type": "string"},
    "b" : {"type": "string"},
    "c" : {
        "Unsure what to define here": {"type": "list"} #but this is a list of dicts
    },
    "d": {"type": "list"} #but this is a list of dicts
}

Upvotes: 0

Views: 71

Answers (2)

A l w a y s S u n n y
A l w a y s S u n n y

Reputation: 38502

There are several Python libraries available for validating JSON data, especially when it comes to complex schemas with fixed and user-defined keys. Here are some commonly used libraries, each with unique strengths and options for managing dynamic structures.

The most common are-

Using jsonschema,

from jsonschema import validate, ValidationError

# Define JSON Schema
schema = {
    "type": "object",
    "properties": {
        "a": {"type": "string"},
        "b": {"type": "string"},
        "c": {
            "type": "object",
            "patternProperties": {
                "^custom_": {  # Any key in "c" must start with "custom_"
                    "type": "array",
                    "items": {"type": "object"}
                }
            },
            "additionalProperties": False
        },
        "d": {
            "type": "array",
            "items": {"type": "object"}
        }
    },
    "required": ["a", "b", "c", "d"],
    "additionalProperties": False
}

# Sample JSON data
data = {
    "a": "some value",
    "b": "another value",
    "c": {
        "custom_a": [{"key1": "value1"}, {"key2": "value2"}],
        "custom_b": [{"key3": "value3"}]
    },
    "d": [{"key4": "value4"}, {"key5": "value5"}]
}

# Validate the JSON data
try:
    validate(instance=data, schema=schema)
    print("Validation successful!")
except ValidationError as e:
    print("Validation failed:", e.message)

Using marshmallow,

from marshmallow import Schema, fields, validate, ValidationError

class CustomEntrySchema(Schema):
    # This allows any string keys and values in each dictionary
    class Meta:
        unknown = 'include'

class MainSchema(Schema):
    a = fields.String(required=True)
    b = fields.String(required=True)
    c = fields.Dict(
        keys=fields.String(validate=validate.Regexp(r'^custom_')),
        values=fields.List(fields.Nested(CustomEntrySchema)),
        required=True
    )
    d = fields.List(fields.Nested(CustomEntrySchema), required=True)

# Sample JSON data
data = {
    "a": "some value",
    "b": "another value",
    "c": {
        "custom_a": [{"key1": "value1"}, {"key2": "value2"}],
        "custom_b": [{"key3": "value3"}]
    },
    "d": [{"key4": "value4"}, {"key5": "value5"}]
}

# Validate the JSON data
schema = MainSchema()
try:
    schema.load(data)
    print("Validation successful!")
except ValidationError as e:
    print("Validation failed:", e.messages)

Using pydantic,

from pydantic import BaseModel, Field, ValidationError, RootModel, model_validator
from typing import List, Dict
import re

class CustomEntryModel(RootModel[Dict[str, str]]):
    """This allows arbitrary key-value pairs in each entry of 'c' and 'd'."""

class MainModel(BaseModel):
    a: str
    b: str
    c: Dict[str, List[CustomEntryModel]]  # We'll validate keys in 'c' manually
    d: List[CustomEntryModel]

    @model_validator(mode="before")
    def validate_custom_keys(cls, values):
        # Check that all keys in 'c' start with "custom_"
        c_data = values.get("c", {})
        for key in c_data:
            if not re.match(r'^custom_', key):
                raise ValueError(f"Key '{key}' in 'c' must start with 'custom_'")
        return values

# Sample JSON data
data = {
    "a": "some value",
    "b": "another value",
    "c": {
        "custom_a": [{"key1": "value1"}, {"key2": "value2"}],
        "custom_b": [{"key3": "value3"}]
    },
    "d": [{"key4": "value4"}, {"key5": "value5"}]
}

# Validate the JSON data
try:
    model = MainModel(**data)
    print("Validation successful!")
except ValidationError as e:
    print("Validation failed:", e)

Output when I ran all of them at once

Validation successful!
Validation successful!
Validation successful!

Upvotes: 1

Jeremy Fiel
Jeremy Fiel

Reputation: 3252

Define the known properties as usual and the unknown properties as additionalProperties with a schema defined

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "object",
    "properties": {
        "a": {"type": "string"},
        "b": {"type": "string"},
        "c": {
            "additionalProperties": {
                "type": "string"
            }
        },
        "d": {
            "type": "array",
            "items": {
                "type": "object"
            }
        }
    }
}

This will allow an instance such as

{
    "a": "some_value",
    "b": "some_value",
    "c": {
        "custom_keyword": "some_value"
    },
    "d": [
        {
            "custom_keyword": 1
        }
    ]
}

Upvotes: 0

Related Questions