Reputation: 179
I'm trying to import a complex data set of regulatory classifications into a new database. The complexity comes from having disparate kinds of classifiers being assigned to the same classified objects. There are hierarchical (one-to-many) "Categories" and non-hierarchical (many-to-many) "Attributes" with an assigned range of values for each "Attribute". The problem is with populating data. Since non-hierarchical "Attributes" are not the same for each of the lowest "Categories", simply marking a secondary
relationship attribute and an a.append(b)
is not going to produce the desired input for connection tables. JSON would be a good format for this, but it's not feasible due to a high number of records. I'm looking for the best way to retain and populate the schema offered below. Here's a simplified example:
Supercat Category Subcategory Attr.Type Attribute Attr.Value
Vertebrae Mammals Cats Color Hair Color Blue
Hair Color Black
Eye Color Blue
Eye Color Green
Magic Yes/No Maybe
Legs Count 4
Habitat Land/Sea Land
Dogs Color Hair Color Brown
Hair Color Black
Eye Color Blue
Eye Color Green
Magic Yes/No No
Legs Count 4
Tailwags Happy Yes
Unhappy No
Habitat Land/Sea Land
Whales Color Skin Color Blue
Skin Color Grey
Habitat Land/Sea Sea
Unicorns Color Hair Color Rainbow
Magic Yes/No Yes
Habitat Land/Sea Unknown
Legs Count 4
Hooves Yes
Fish Sharks Color Skin Color White
Skin Color Grey
Magic Yes/No No
Habitat Land/Sea Sea
Goldfish Color Scales Gold
Habitat Land/Sea Sea
Here's the SQLAlchemy ORM representation of the schema for this getup:
#connection tables:
attribute_type_attributes = db.Table('attribute_type_attributes',
db.Column('type_id', db.ForeignKey('attribute_types.id'), ...),
db.Column('attribute_id', db.ForeignKey('attributes.id'), ...)
)
attribute_values_attributes = db.Table('attribute_type_attributes',
db.Column('type_id', db.ForeignKey('attribute_types.id'), ...),
db.Column('value_id', db.ForeignKey('attribute_values.id'), ...)
)
subcategory_attribute_types = db.Table('subcategory_attribute_types',
db.Column('type_id', db.ForeignKey('attribute_types.id'), ...),
db.Column('subcat_id', db.ForeignKey('subcats.id'), ...)
)
subcategory_attributes = db.Table('subcategory_attributes',
db.Column('type_id', db.ForeignKey('attributes.id'), ...),
db.Column('subcat_id', db.ForeignKey('subcats.id'), ...)
)
subcategory_attribute_values = db.Table('subcategory_attribute_values',
db.Column('type_id', db.ForeignKey('attribute_values.id'), ...),
db.Column('subcat_id', db.ForeignKey('subcats.id'), ...)
)
#models:
class Supercat(db.Model):
id = db.Column(...)
name = db.Column(...)
class Cat(db.Model):
id = db.Column(...)
supercat_id = db.Column(..., db.ForeignKey('supercats.id'), ...)
name = db.Column(...)
class Subcat(db.Model):
id = db.model(...)
cat_id = db.Column(..., db.ForeignKey('cats.id'), ...)
name = db.Column(...)
class Attribute_type(db.Model):
__tablename__ = 'attribute_types'
id = db.Column(...)
name = db.Column(...)
class Attribute(db.Model):
__tablename__ = 'attributes'
id = db.Column(...)
name = db.Column(...)
class Attribute_value(db.Model):
__tablename__ = 'attribute_values'
id = db.Column(...)
name = db.Column(...)
Once again, different hierarchical "Categories" have different non-hierarchical "Attributes", and they correlate to different sets of values. I cannot have the non-hierarchical "Attributes" or their values related to the wrong "Subcategory" because the data are not mine to alter.
Any thoughts?
Upvotes: 1
Views: 245
Reputation: 1601
To store the category hierarchy you could use 3 tables (super-cat, cat, sub-cat) as you currently have or a single table (category w/ parent_category_id). I'd probably use a single table but separate tables also seems fine.
Then have a table to store the kinds of attributes (e.g. 'Hair color') in an
attributes
table with id
, attribute_type
, and name
columns. You can either store attribute_type
as a string (e.g. Color
) or as a reference to an attribute_types
table that stores the name Color
.
Finally, to store the attribute values for each sub-category I would have a table (e.g. subcategory_attributes
) that references both the sub-category table and the attributes table and stores the value
(e.g. 'Blue'). You can also pull out the values
into their own table and just reference them if you want to further normalize the data/if it makes sense.
Here's some terrible ascii art depicting the schema.
[ super-categories ] [ attribute_types (name:Color) ]
^ ^
| |
[ categories ] [ attributes (name:Hair color) ]
^ ^
| |
[ subcategories ] <-- [ subcategory_attributes (value:Blue) ]
Maybe I'm missing something but I think I'm missing the difficulty in associating one or more attribute-values (that have a specific attribute type, etc) with a specific sub-category.
Upvotes: 1