user1960364
user1960364

Reputation: 2001

Meta Tables in MySQL

I'm rewriting a system that is currently linked to a MySQL database that is roughly 1GB in size. There are hundreds of thousands of articles, each with a list of contributors (think Wiki style). I've not yet been given access to the existing database schema, but while I wait I've been brainstorming a bit.

Basically, what I'm wondering is if having an article_contributors table would be an efficient way of handling this or if there is a better method to approaching this situation. Considering there are roughly 200,000 articles, if there are 5 contributors on each, that'd be 1,000,000 rows in the meta table.

Upvotes: 3

Views: 808

Answers (2)

O. Jones
O. Jones

Reputation: 108651

An interesting question!

You're going to need to see the schema to get a straight answer about this. That's because the schema probably embodies some core decisions made by experts in bibliography (reference librarians, etc).

If you try use a join table (articles_contributors) so you can avoid listing a given contributor multiple times when she contributes to multiple articles, you're implicitly declaring that you can create a canonical list of contributors, with a contributor_id for each distinct person.

In the world of bibliography and library science, that sort of list is called a "controlled vocabulary" It's controlled by an "authority." (Read this: http://en.wikipedia.org/wiki/Authority_control) That is, some organization has the responsibility to decide whether this "Jane Smaith" is a different person from that "Jane Smith." That is surprisingly hard to do correctly with people.

For an example of a relatively simple controlled vocabulary, see the "North American Industry Classification System" (NAICS). This has a code for each distinct kind of industry. http://www.census.gov/eos/www/naics/ It's controlled by national committees in three countries. Many bibliographic databases that cover industry include those terms as one of the ways of classifying their contents.

The designers of the system you're soon to take over will have made decisions about these kinds of controlled vocabularies. Will they have one for contributors? You could wait and see, or you could ask. But one thing is sure: the bibliographic designers won't be too delighted if you, on your own authority, create that kind of controlled vocabulary.

The Library of Congress in the USA doesn't attempt to create a controlled list of authors and contributors.

Edit

If you do have a definitive list of contributors, it is a good idea to create a join table articles_contributors as you suggested. You should consider the following columns:

 article_id        primary key
 contributor_id    primary key
 role              primary key   values like ("author", "illustrator", "editor", etc)
 order             1, 2, 3  so contributors can be listed in proper order.
 contact           1 or 0  indicating whether readers should contact this author for more info.

Upvotes: 1

Bill Karwin
Bill Karwin

Reputation: 562330

I'd call that a one-to-many table, not a "meta" table. Or else a multi-valued attribute.

Storing contributors in a separate table, one per row, is the proper way of designing a relational database. There may be other ways to store the data, but they are not relational.

Consider my answer to Is storing a delimited list in a database column really that bad? Storing the contributors as a list in the articles table causes a lot of common SQL queries to break or become horribly inefficient. If you need to do a variety of queries against this data, you will thank yourself for storing it in a normalized fashion.

On the other hand, if you never query anything but the list of contributors as an indivisible unit, then why not store it denormalized (as a list)? That's a valid choice too -- but it depends on how you're going to use the table.

By the way, 1 million rows is not a large MySQL database by some people's standards. This week I'm advising a client who has a table with 900 million rows.

Upvotes: 2

Related Questions