Reputation: 1024
I'm designing a database for a social app and am trying to determine if my approach is 1) going to perform well, and 2) properly normalized?
My research on tag query performance and db design concluded that a single tags table with full text index search yields the best performance.
See this: http://tagging.pui.ch/post/37027746608/tagsystems-performance-tests
I know I could (and should from a pure normalization standpoint) put the tags in a separate table with a key per tag, but performance would suffer as the db grows large (according to the linked article). Tag searching is a key component to my app and must perform well.
The below structure illustrates a basic approach I've devised that uses a bridge metadata table, and I expect many more "object tables" to be bridged using this single table, but I provide only a couple to give the idea:
Users Table: UserID PK, UserName, Etc
Blogs Table: BlogID PK, UserID FK, BlogTxt, Etc
Photos Table: PhotoID PK, UserID FK, PhotoPath, Etc
Metadata Table: MetadataID PK, UserID FK, ObjectTable (Posts or Blogs), ObjectID FK (PostID or BlogID), Tags (tag1,tag2,tag3)
In addition to the above questions, I'm also interested to know if there are better alternatives. I'm new to db design so please excuse any serious ignorance on the proper way of doing this. Thanks much.
Upvotes: 2
Views: 364
Reputation: 78513
My research on tag query performance and db design concluded that a single tags table with full text index search yields the best performance.
This is actually incorrect...
The best performance you can get is to switch to a database engine that has an array type and bitmap index scans, maintain an aggregate of your tags in an int[] array column using triggers, and add an appropriate index (gin, gist, rtree) on it.
This allows to write queries (Postgres syntax below) such as:
create index on posts using gin (tags);
-- bitmap AND/OR index scan on posts
-- has 1 or 2 or 3 or any of 4, 5, 6 without 7 or 8
select *
from posts
where tags && array[1,2,3]
or tags && array[4,5,6] and not tags && array[7,8]
The above will blow away any potential optimization you can think of using MySQL.
Upvotes: 1