grommit
grommit

Reputation: 179

populate connection tables with sqlalchemy ORM for a complex schema

I'm trying to import a complex data set of regulatory classifications into a new database. The complexity comes from having disparate kinds of classifiers being assigned to the same classified objects. There are hierarchical (one-to-many) "Categories" and non-hierarchical (many-to-many) "Attributes" with an assigned range of values for each "Attribute". The problem is with populating data. Since non-hierarchical "Attributes" are not the same for each of the lowest "Categories", simply marking a secondary relationship attribute and an a.append(b) is not going to produce the desired input for connection tables. JSON would be a good format for this, but it's not feasible due to a high number of records. I'm looking for the best way to retain and populate the schema offered below. Here's a simplified example:

Supercat  Category  Subcategory Attr.Type Attribute  Attr.Value

Vertebrae Mammals   Cats        Color     Hair Color Blue
                                          Hair Color Black
                                          Eye Color  Blue
                                          Eye Color  Green
                                Magic     Yes/No     Maybe
                                Legs      Count      4
                                Habitat   Land/Sea   Land
                    Dogs        Color     Hair Color Brown
                                          Hair Color Black
                                          Eye Color  Blue
                                          Eye Color  Green
                                Magic     Yes/No     No
                                Legs      Count      4
                                Tailwags  Happy      Yes
                                          Unhappy    No
                                Habitat   Land/Sea   Land
                    Whales      Color     Skin Color Blue
                                          Skin Color Grey
                                Habitat   Land/Sea   Sea
                    Unicorns    Color     Hair Color Rainbow
                                Magic     Yes/No     Yes
                                Habitat   Land/Sea   Unknown
                                Legs      Count      4
                                          Hooves     Yes
          Fish      Sharks      Color     Skin Color White
                                          Skin Color Grey
                                Magic     Yes/No     No
                                Habitat   Land/Sea   Sea
                    Goldfish    Color     Scales     Gold
                                Habitat   Land/Sea   Sea

Here's the SQLAlchemy ORM representation of the schema for this getup:

#connection tables:
attribute_type_attributes = db.Table('attribute_type_attributes',
    db.Column('type_id', db.ForeignKey('attribute_types.id'), ...),
    db.Column('attribute_id', db.ForeignKey('attributes.id'), ...)
)

attribute_values_attributes = db.Table('attribute_type_attributes',
    db.Column('type_id', db.ForeignKey('attribute_types.id'), ...),
    db.Column('value_id', db.ForeignKey('attribute_values.id'), ...)
)

subcategory_attribute_types = db.Table('subcategory_attribute_types',
    db.Column('type_id', db.ForeignKey('attribute_types.id'), ...),
    db.Column('subcat_id', db.ForeignKey('subcats.id'), ...)
)

subcategory_attributes = db.Table('subcategory_attributes',
    db.Column('type_id', db.ForeignKey('attributes.id'), ...),
    db.Column('subcat_id', db.ForeignKey('subcats.id'), ...)
)

subcategory_attribute_values = db.Table('subcategory_attribute_values',
    db.Column('type_id', db.ForeignKey('attribute_values.id'), ...),
    db.Column('subcat_id', db.ForeignKey('subcats.id'), ...)
)
#models:
class Supercat(db.Model):
    id = db.Column(...)
    name = db.Column(...)

class Cat(db.Model):
    id = db.Column(...)
    supercat_id = db.Column(..., db.ForeignKey('supercats.id'), ...)
    name = db.Column(...)

class Subcat(db.Model):
    id = db.model(...)
    cat_id = db.Column(..., db.ForeignKey('cats.id'), ...)
    name = db.Column(...)

class Attribute_type(db.Model):
    __tablename__ = 'attribute_types'
    id = db.Column(...)
    name = db.Column(...)

class Attribute(db.Model):
    __tablename__ = 'attributes'
    id = db.Column(...)
    name = db.Column(...)

class Attribute_value(db.Model):
    __tablename__ = 'attribute_values'
    id = db.Column(...)
    name = db.Column(...)

Once again, different hierarchical "Categories" have different non-hierarchical "Attributes", and they correlate to different sets of values. I cannot have the non-hierarchical "Attributes" or their values related to the wrong "Subcategory" because the data are not mine to alter.

Any thoughts?

Upvotes: 1

Views: 245

Answers (1)

melcher
melcher

Reputation: 1601

To store the category hierarchy you could use 3 tables (super-cat, cat, sub-cat) as you currently have or a single table (category w/ parent_category_id). I'd probably use a single table but separate tables also seems fine.

Then have a table to store the kinds of attributes (e.g. 'Hair color') in an attributes table with id, attribute_type, and name columns. You can either store attribute_type as a string (e.g. Color) or as a reference to an attribute_types table that stores the name Color.

Finally, to store the attribute values for each sub-category I would have a table (e.g. subcategory_attributes) that references both the sub-category table and the attributes table and stores the value (e.g. 'Blue'). You can also pull out the values into their own table and just reference them if you want to further normalize the data/if it makes sense.

Here's some terrible ascii art depicting the schema.

[ super-categories ]      [  attribute_types (name:Color)    ]
        ^                                    ^
        |                                    |
[   categories     ]      [     attributes (name:Hair color) ]
        ^                                    ^
        |                                    |
[  subcategories   ]  <-- [ subcategory_attributes (value:Blue) ]  

Maybe I'm missing something but I think I'm missing the difficulty in associating one or more attribute-values (that have a specific attribute type, etc) with a specific sub-category.

Upvotes: 1

Related Questions