Lucas Stephanou
Lucas Stephanou

Reputation: 99

Use schema.org as database schema a rational approuch?

In a project for a news/content platform, our team is debating around using schema.org vocabulary as a direct source to model our database and our domain logic.

Example: The NewsArticle entity has the following hierarchy on schema.org:

Thing > CreativeWork > Article > NewsArticle 

Our domain would have one class for each of them and use extension to make the hierarchy. The same pattern would be put on the database, meaning we would create four tables(or maybe use document dbs).

Fields in each level would be put on the "right" class/table, meaning that a NewsArticle instance would need to get all previous hierarchy content to compose its full representation.

Even thinking that schema.org would be a good reference to aid our domain model design, how to name fields and so on, a direct fit seems naive and even harmful, without clear benefits to compensate the investment and bloat that should come.

Do you see benefits on this approuch? Do you see problems?

ps: This is not related to use schema.org and related vocabularies(rdf/a, rNews) to markup webpages(which I encourage).

Upvotes: 7

Views: 2057

Answers (1)

FrobberOfBits
FrobberOfBits

Reputation: 18022

tl;dr, don't make these schemas the basis of your database model or logic, but maintain clean mappings to/from what you do, and these schemas if they're important to you.

(Longer version below)

When you're writing a complex web app, you're not going to be able to avoid the need for multiple models. There might be an RDBMS physical model, there might be an OOP "object model", and there's probably going to be a web front end data model (whether in JSON, or in the HTML DOM using schema.org's stuff). These models will be in different formalisms, and will have different strengths and weaknesses.

Just between an RDBMS and your object hierarchy, you'll run into the Object-Relational Impedance Mismatch pretty much no matter what you do. This is a really hard problem, for which various MVC frameworks and data binding tools exist to help you pair OOP objects to database constructs.

Now, if you were to use schema.org's structures straight in your database, it would appear as though this will make your life easier because the model is the same everywhere. But because of impedence mismatches, it simply won't be. When you go to do this in a database, you'll first start by defining primary/foreign key relationships to get between the entities that schema.org defines (they don't provide them, because it isn't a relational model). That will be just the beginning of the drift away from their model to something that inherently must be relational. In your object model, you won't have PKs/FKs, you'll have object references. Starting with a physical model will make reusing other toolkits harder. And in the end, your custom object stack won't easily serialize to schema.org's structure in HTML without a lot of extra code you have to write.

Different formalisms call for different models. Understanding the mapping between these (and data requirements traceability) is really important, but I think you should give up on the notion that you'll reuse those models as is, because it probably can't happen. Impedence mismatches are things that you can't make go away; you can only make smart decisions on how you'd best like to cope with/manage them.

Feel free to steal their naming conventions, structure, and ideas, but I'd give up on reusing it verbatim. Also, what are the chances that their model is best for your query load?

Upvotes: 5

Related Questions