scarpacci
scarpacci

Reputation: 9194

How to efficiently prune data

I am currently working on a problem for work where I need to take data and prune the generated scenarios based on user defined restrictions. I have tried a multitude of things, but can't seem to get anything to run as efficiently as I would like. I might have to run outside of the DB so I can scale the run, but thought I should try to perform inside the DB if possible. So for instance if I have 3 entities:

Transportation Type:
Car
Boat
Plane

Color:
Blue
Green
Red
Purple
White

Accessories:
Trailer
Wheels
Propeller
Parachute

Users Could Enter restrictions:

Transportation_Type=Boat, Accessories= Wheels

So any combination where you had a scenario that had Boat and Wheels would be restricted.

Example Valid Scenario with restriction: Boat/Red/Trailer

So where this gets complicated is that you can imagine that if I build all possible scenarios for the 3 entities that isn't too bad, even with user defined restrictions. But, what if there are like 22 entities(Entities is basically a level with values). You can imagine that this could get huge and would be difficult to apply restrictions. Especially when it is a set of Level / Values (So like the Boat and Wheels) that make up a restriction.

Anyone have any thoughts?

I was able to get it to be really performant through about 14-16 levels by building dynamic like statements that I could check the derived scenario against. But after that the process time explodes (which it could at lower levels if there were a lot more values in the level).

Upvotes: 2

Views: 238

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270593

If I undersand correctly, the goal is to generate scenarios that meet certain criteria. The scenarios would be generated from combinations of attributes.

Assuming that each entity is in a separate table, you could do the query as:

select *
from TransportationType tt cross join
     Color c cross join
     Accessories a
where tt.val in (<accepted transportation types>) and
      c.val in (<accepted colors>) and
      a.val in (<accepted accessories>)

If my understanding is correct, this will generate lots of scenarios as the number of entities increases. If you have a table of allowable scenarios (combinations of entities), then that would help filter things down.

I've shown this with separate tables for each entity, but you can replace them with subqueries:

from (select *
      from table t
      where t.type = 'TransportationType'
     ) TransportationType cross join
     ...

Upvotes: 1

A. M.
A. M.

Reputation: 1545

Your problem looks like a "Bill of Materials" problem (BOM). Each possibles valid scenario could be represented as a hierarchized system.

            Transportation
                   |
                  Type
                   |
               Accessories
                   |
      Trailer Wheels Propeller Parachute

Reading your question, Color is not a constraint so it does not need to be intregrated to the constraint tree.

SQL Server 2008 provides an extremely compact and fast type to encode these kind of hierarchies : HierarchyId type.

Using a lookup table with HierarchyId, you can easily define your constraints, answer to the scenario validity question and extract the corresponding scenario results.

A good example of BOM resolution with HierarchyId can be read on the MSDN Magazine, september 2008

Upvotes: 1

Related Questions