Li-Wen Yip
Li-Wen Yip

Reputation: 335

Convert nested dict/json to a django ORM model, without hard coding the data structure

I want to import data from json files into my django db. The json contains nested objects.

Current steps are:

  1. Set up my django object models to match the json schema (done manually - see models.py file below)
  2. Import json file into python dict using mydict = json.loads(file.read()) (done)
  3. Convert dict to django models (done - but solution is not pretty)

Is there a way I can convert my nested dict into django models (i.e. step 3) without hard-coding the data structure into the logic?

Bonus points for automatically generating the django models (i.e. the models.py file) based on an example json file.

Thanks in advance!

How I'm currently doing it

Step 3 is easy if the dict does not contain any nested dicts - just construct a new object from the dict i.e. MyModel.objects.create(**mydict) or use django fixtures.

However, because my json/dict contains nested objects, I'm currently doing step 3 like this:

# read the json file into a python dict
d = json.loads(myfile.read())

# construct top-level object using the top-level dict
# (excluding nested lists of dicts called 'judges' and 'contestants')
c = Contest.objects.create(**{k:v for k,v in d.items() if k not in ('judges', 'contestants')})

# construct nested objects using the nested dicts
for judge in d['judges']:
    c.judge_set.create(**judge)
for contestant in d['contestants']:
    ct = c.contestant_set.create(**{k:v for k,v in contestant.items() if k not in ('singers', 'songs')})
    # all contestants sing songs
    for song in contestant['songs']:
        ct.song_set.create(**song)
    # not all contestants have a list of singers
    if 'singers' in contestant:
        for singer in contestant['singers']:
            ct.singer_set.create(**singer)

This works, but requires the data structure to be hard coded into the logic:

Data structures

example json looks like this:

{
  "assoc": "THE BRITISH ASSOCIATION OF BARBERSHOP SINGERS",
  "contest": "QUARTET FINAL (NATIONAL STREAM)",
  "location": "CHELTENHAM",
  "year": "2007/08",
  "date": "25/05/2008",
  "type": "quartet final",
  "filename": "BABS/2008QF.pdf"
  "judges": [
    {"cat": "m", "name": "Rod"},
    {"cat": "m", "name": "Bob"},
    {"cat": "p", "name": "Pat"},
    {"cat": "p", "name": "Bob"},
    {"cat": "s", "name": "Mark"},
    {"cat": "s", "name": "Barry"},
    {"cat": "a", "name": "Phil"}
  ],
  "contestants": [
    {
      "prev_tot_score": "1393",
      "tot_score": "2774",
      "rank_m": "1",
      "rank_s": "1",
      "rank_p": "1",
      "rank": "1", "name": "Monkey Magic",
      "pc_score": "77.1",
      "songs": [
        {"title": "Undecided Medley","m": "234","s": "226","p": "241"},
        {"title": "What Kind Of Fool Am I","m": "232","s": "230","p": "230"},
        {"title": "Previous","m": "465","s": "462","p": "454"}
      ],
      "singers": [
        {"part": "tenor","name": "Alan"},
        {"part": "lead","name": "Zac"},
        {"part": "bari","name": "Joe"},
        {"part": "bass","name": "Duncan"}
      ]
    },
    {
      "prev_tot_score": "1342",
      "tot_score": "2690",
      "rank_m": "2",
      "rank_s": "2",
      "rank_p": "2",
      "rank": "2", "name": "Evolution",
      "pc_score": "74.7",
      "songs": [
        {"title": "It's Impossible","m": "224","s": "225","p": "218"},
        {"title": "Come Fly With Me","m": "225","s": "222","p": "228"},
        {"title": "Previous","m": "448","s": "453","p": "447"}
      ],
      "singers": [
        {"part": "tenor","name": "Tony"},
        {"part": "lead","name": "Michael"},
        {"part": "bari","name": "Geoff"},
        {"part": "bass","name": "Stuart"}
      ]
    },
  ],
}

My models.py file:

from django.db import models

# Create your models here.

class Contest(models.Model):
    assoc = models.CharField(max_length=100)
    contest = models.CharField(max_length=100)
    date = models.DateField()
    filename = models.CharField(max_length=100)
    location = models.CharField(max_length=100)
    type = models.CharField(max_length=20)
    year = models.CharField(max_length=20)


class Judge(models.Model):
    contest = models.ForeignKey(Contest, on_delete=models.CASCADE)
    name = models.CharField(max_length=60)
    cat = models.CharField('Category', max_length=2)


class Contestant(models.Model):
    contest = models.ForeignKey(Contest, on_delete=models.CASCADE)
    name = models.CharField(max_length=100)
    tot_score = models.IntegerField('Total Score')
    rank_m = models.IntegerField()
    rank_s = models.IntegerField()
    rank_p = models.IntegerField()
    rank = models.IntegerField()
    pc_score = models.DecimalField(max_digits=4, decimal_places=1)
    # optional fields
    director = models.CharField(max_length=100, blank=True, null=True)
    size = models.IntegerField(blank=True, null=True)
    prev_tot_score = models.IntegerField(blank=True, null=True)


class Song(models.Model):
    contestant = models.ForeignKey(Contestant, on_delete=models.CASCADE)
    title = models.CharField(max_length=100)
    m = models.IntegerField('Music')
    s = models.IntegerField('Singing')
    p = models.IntegerField('Performance')

class Singer(models.Model):
    contestant = models.ForeignKey(Contestant, on_delete=models.CASCADE)
    name = models.CharField(max_length=100)
    part = models.CharField('Category', max_length=5)

Upvotes: 1

Views: 1464

Answers (1)

Maresh
Maresh

Reputation: 4712

You could browse the json object recursively and use a key to class mapping to instantiate your models dynamically. Here's an idea (not a working solution!):

 key_model = {
        "contestants": Contestant,
        "singers": Singer
 }

 def make_sub_model(parent, model, vals):
    for v in vals:
       child = create_model(model, v)
       parent.add_child(child) # or whatever it is with Django Models

def create_model(model, obj):
    # model should be the class and obj a dict

    # take care of the top lvl object
    to_process = [] # store nest models
    parent = {} # store parent attributes
    for k, v in obj.items():
        if isinstance(v, list): # you probably want dict as well
            to_process.append((k, v))
        else:
           parent[k] = v

    parent_obj = model.create(**parent)
    # now process the chidlrend
    for k, v in to_process:
        make_sub_model(parent_obj, key_model[k], v)

    return parent_obj

But in the end, I would discourage this because you are using a Schema based storage (SQL) so your code should enforce that the input matches your schema (you can't handle anything different on the fly anyway). If you don't care about having a schema at all go for a No-SQL solution and you won't have this problem. Or a hybrid like PostgresSQL.

Upvotes: 1

Related Questions