sscirrus
sscirrus

Reputation: 56719

A database design for variable column names

I have a situation that involves Companies, Projects, and Employees who write Reports on Projects.

A Company owns many projects, many reports, and many employees.
One report is written by one employee for one of the company's projects.

Companies each want different things in a report. Let's say one company wants to know about project performance and speed, while another wants to know about cost-effectiveness. There are 5-15 criteria, set differently by each company, which ALL apply to all of that company's project reports.

I was thinking about different ways to do this, but my current stalemate is this:

I am completely aware that this is typically considered horrible database design - inelegant and inflexible. So, I need your help! How can I build this better?


Conclusion

I decided to go with the serialized option in my case, for these reasons:

  1. My requirements for the criteria are simple - no searching or sorting will be required of the reports once they are submitted by each employee.
  2. I wanted to minimize database load - where these are going to be implemented, there is already a large page with overhead.
  3. I want to avoid complicating my database structure for what I believe is a relatively simple need.
  4. CouchDB and Mongo are not currently in my repertoire so I'll save them for a more needy day.

Upvotes: 2

Views: 688

Answers (3)

Jordão
Jordão

Reputation: 56467

Create a criteria table that lists the criteria for each company (company 1 .. * criteria).

Then, create a report_criteria table (report 1 .. * report_criteria) that lists the criteria for that specific report based on the criteria table (criteria 1 .. * report_criteria).

Upvotes: 0

Jakub Hampl
Jakub Hampl

Reputation: 40533

This would be a great opportunity to use NoSQL! Seems like the textbook use-case to me. So head over to CouchDB or Mongo and start hacking.

With conventional DBs you are slightly caught in the problem of how much to normalize your data:

  1. A sort of "good" way (meaning very normalized) would look something like this:

    class Company < AR::Base
      has_many :reports
      has_many :criteria
    end
    
    class Report < AR::Base
      belongs_to :company
      has_many :criteria_values
      has_many :criteria, :through => :criteria_values
    
    end
    
    class Criteria < AR::Base # should be Criterion but whatever
      belongs_to :company
      has_many :criteria_values
      # one attribute 'name' (or 'type' and you can mess with STI)
    end
    
    class CriteriaValues < AR::Base
      belongs_to :report
      belongs_to :criteria
      # one attribute 'value'
    end
    

    This makes something very simple and fast in NoSQL a triple or quadruple join in SQL and you have many models that pretty much do nothing.

  2. Another way is to denormalize:

    class Company < AR::Base
      has_many :reports
      serialize :criteria
    end
    
    class Report < AR::Base
      belongs_to :company
      serialize :criteria_values
    
      def criteria
        self.company.criteria
      end
      # custom code here to validate that criteria_values correspond to criteria etc.
    end
    

    Related to that is the rather clever way of serializing at least the criteria (and maybe values if they were all boolean) is using bit fields. This basically gives you more or less easy migrations (hard to delete and modify, but easy to add) and search-ability without any overhead.

    A good plugin that implements this is Flag Shih Tzu which I've used on a few projects and could recommend.

  3. Variable columns (eg. crit1, crit2, etc.).

    I'd strongly advise against it. You don't get much benefit (it's still not very searchable since you don't know in which column your info is) and it leads to maintainability nightmares. Imagine your db gets to a few million records and suddenly someone needs 16 criteria. What could have been a complete no-issue is suddenly a migration that adds a completely useless field to millions of records.

    Another problem is that a lot of the ActiveRecord magic doesn't work with this - you'll have to figure out what crit1 means by yourself - now if you wan't to add validations on these fields then that adds a lot of pointless work.

So to summarize: Have a look at Mongo or CouchDB and if that seems impractical, go ahead and save your stuff serialized. If you need to do complex validation and don't care too much about DB load then normalize away and take option 1.

Upvotes: 3

Pete Wilson
Pete Wilson

Reputation: 8694

Well, when you say "To company table, add text field criteria, which contains an array of the criteria desired in order" that smells like the company table wants to be normalized: you might break out each criterion in one of 15 columns called "criterion1", ..., "criterion15" where any or all columns can default to null.

To me, you are on the right track with your report table. Each row in that table might represent one report; and might have corresponding columns "criterion1",...,"criterion15", as you say, where each cell says how well the company did on that column's criterion. There will be multiple reports per company, so you'll need a date (or report-number or similar) column in the report table. Then the date plus the company id can be a composite key; and the company id can be a non-unique index. As can the report date/number/some-identifier. And don't forget a column for the reporting-employee id.

Any and every criterion column in the report table can be null, meaning (maybe) that the employee did not report on this criterion; or that this criterion (column) did not apply in this report (row).

It seems like that would work fine. I don't see that you ever need to do a join. It looks perfectly straightforward, at least to these naive and ignorant eyes.

Upvotes: 0

Related Questions