Should I normalize similar schema even if the data may be conceptually unrelated?

Question

Lets say I have a table of things, and I need to keep some auditing information on who, when and where was something done to the things.

A basic schema could look like:

Things
  - ID
  - ThingName
  - CreatedOn
  - CreatedBy
  - CreatedIn
  - LastModifiedOn
  - LastModifiedBy
  - LastModifiedIn
  - HiddenOn (nullable)
  - HiddenBy (nullable)
  - HiddenIn (nullable)

This bugs me a bit. Coming from OOP, and given that this data will be primarily consumed by linq-to-sql, it feels like I could extract the repeated fields into a separate structure, which would also act as a schema contract of sorts:

Actions
  - ID
  - ExecutedOn
  - ExecutedBy
  - ExecutedIn

Things
  - ID
  - ThingName
  - CreatedAction -> Actions.ID
  - LastModifiedAction -> Actions.ID
  - HiddenAction (nullable) -> Actions.ID

I could then reuse the Actions table for storing the same auditing information for other Things I might have in the database. There would be a lot of foreign keys pointing to this table.

I'm concerned on whether aggregating data related to many parts of the database could cause problems in the long run. I'm wondering,

Would this become a source of insert contention (Working with SQL Server 2008)?
Would searching by those fields become more expensive because there will be a lot more rows, and could I mitigate that by indexing it on a discriminator?
Generally, good idea or bad?

Thanks

Philip Kelley · Accepted Answer

By and large, this seems a fairly solid design, though much depends upon the specifics and details of your acutal business environment. Some thoughts:

You are tracking many entities: a number of kinds of things, and the “Action” entity. Action is no longer part of a thing, it is its own thing. (Fortunately, you shouldn’t need to enter an Action to log work done on an Action, right?)
If you want to quickly pull out actions taken upon one certain kind of thing, you may want to have a column (your “discriminator”) in the Actions table denoting the kind of thing the Action was taken upon. (Having to join on N different thing tables just to detrmine type tends to kill performance.)
Can a single action have an impact on more than one thing (say, create a new thing and modify an existing thing)? That would be more complex to model.
Here, you log all actions take on an thing in the Actions table, and in the thing table only record the most recent Actions taken of 3 action types (Create, Modify, Hide). When a new action of one of those types is taken, the last one (if any) is bumped from the thing table… but that “old” action event is still recorded within the Action table. For your business purposes, is useful or irrelevant data?
As for insert contention, it would depend on the RDBMS you use. This is an old old problem, and most modern systems have gotten pretty good at dealing with it. Build, test, watch for blocking and long transaction issues, and you should be fine.
If the table gets big, build indexes. If you frequently search or filter by action type, add that discriminator column and include it in the index, (Again, different RDBMSs have different indexing features, read the documentation to see what might work best for you.)

Good idea? Bad idea? Totally depends upon the nature of your business environemnt and its data saving/retrieving requirements. Hopfeully this helps you in your analysis and decision making.

Should I normalize similar schema even if the data may be conceptually unrelated?

Answers (1)

Related Questions