Reputation: 21
I'll at least try to keep this succinct.
Let's suppose we're tracking the balances of accounts over time. So our fact table will have columns such as...
Account Balance Fact Table
Obviously you have an Account Dimension Table and a Date Dimension Table. So now we can easily filter on Accounts or Dates (or date ranges, etc.).
But here's the kicker... Accounts can belong to Groups -- any number of Groups at a given Date. Groups are simply logical abstractions, and they have no tangible meaning aside from reporting purposes. An Account being in 0, 1, or 17 groups doesn't affect its Balance in any way. For example, AccountID 1 may be in Groups 38, 76, 104, and 159. Account 2 may be in Group 1 (which has a Group Description of "Ungrouped". Account 3 may be in seventeen groups (real example).
As an added bonus, our users are completely non-technical. They don't know SQL, they have no experience with relational databases, and have historically done all of their work in a convoluted Excel solution. Right now we're building a dimensional model they can slice and filter with PowerPivot, though these Account Groups are threatening to turn an otherwise ruthlessly simple Star Schema into something complex enough that the users will balk and return to their current spaghetti solution.
So let's look at our options...
Boolean Method The Boolean method is not feasible. We have about 570,000 different accounts, but more importantly, 26,000 different groups. This would also be a devil for end-users to filter, since they're non-technical and are relying on very simple tools to get this done.
Multiple Column Method In theory this could work, however, we do have some accounts that belong to 17 groups. Again, the groups are really just logical groups -- they have no meaning, but they are required by the business for reporting purposes. Having end-users filter out groups from 17 different columns isn't going to go over well in user-acceptance, and would likely result in users refusing to use the solution (and I don't blame them).
Bridge Table This count work, but we do have 26,000 different groups. I'm not finding this to be user-friendly.
Since I'm not liking my options, I can only assume there's a better way other than snowflaking... unless snowflaking IS the only way. If someone could lend a hand and explain their rationale it'd be appreciated.
UPDATE: For clarification, an example I think everyone here can relate to is imagine you can list keyword skills on a resume. They all relate to the same person, but you can have any number of skills. The skills don't effect any of individual measures on a resume -- i.e. 'C++' isn't more valuable than 'C#' -- you can't put all the resume/skill combinations in the fact table or you'd end up double counting (or well more than double ;) ).
I think the best I'm going to be able to do here is to create an outrigger table for groups. I'm not a fan of it, but I think it's the only real option I have.
So now we have...
Account Balance Fact Table
Account Dimension
Account Group Outrigger
Upvotes: 2
Views: 556
Reputation: 7525
If I correctly understood your question this should be okay:
CREATE TABLE IF NOT EXISTS `accounts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;
CREATE TABLE IF NOT EXISTS `accounts_groups` (
`account_id` int(11) NOT NULL,
`group_id` int(11) NOT NULL,
`start_date` date NOT NULL,
`end_date` date DEFAULT NULL,
UNIQUE KEY `account_group` (`account_id`,`group_id`,`start_date`),
KEY `group_id` (`group_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `account_balances` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`date` date NOT NULL,
`balance` decimal(11,2) NOT NULL,
PRIMARY KEY (`id`),
KEY `account_id` (`account_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;
CREATE TABLE IF NOT EXISTS `groups` (
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;
ALTER TABLE `accounts_groups`
ADD CONSTRAINT `accounts_groups_ibfk_1` FOREIGN KEY (`account_id`) REFERENCES `accounts` (`id`),
ADD CONSTRAINT `accounts_groups_ibfk_2` FOREIGN KEY (`group_id`) REFERENCES `groups` (`id`);
ALTER TABLE `account_balances`
ADD CONSTRAINT `account_balances_ibfk_1` FOREIGN KEY (`account_id`) REFERENCES `accounts` (`id`);
Upvotes: 0
Reputation: 89731
I would say you've got to start from the interface. How would users like to do their filtering in an ideal world?
I think I would end up going for a bridge or factless fact table or something like that. Perhaps a surrogate key on the fact table and a many-many link table from that to group membership.
It's definitely tough - and the interface and usage cases has to be made workable, so I'd start from there. Perhaps something will shake out of how they do this reporting - like equivalence classes in the groups or some way they partition the account space. Maybe there is a hierarchy or organization to the groups which make it more manageable and may inform a simpler design.
Upvotes: 1