cgatian
cgatian

Reputation: 22994

Building a Relationship Between Attributes Or Columns Of Bits "Flatting it out"

I have the following SQL design issue. The code below might look a little much but basically I have a table of cars and another table of attributes the car could have. This makes complete sense to me to structure a table of attributes for an object using a linking table, #CarSupportedAtttibutes. Recently I've been tasked with doing something similar but use one table that has each of the Attributes as columns making it "flat". Similar to below:

[CarId][Name][Manual Transmission][Sunroof][Automatic Transmission][AWD]

I am told doing so it will boost the speed of my queries, but its starting to turn into a nightmare. In C# I have enumerated values for each of the car's attributes, 1 = Manual Transmission, so using the non "flat" version I am able to pull off a query pretty quickly as the SQL code below shows. Since I am being pushed to making the table flat for speed the only way I can think of is to take the enumerated value and build it into the where clause, using a case statement for every 1,2,3 and selecting off a column name.

To me it just makes more sense to organize the data like below. Like what if a new attribute about a car is needed, say "HEMI Engine". Not all cars are going to have this, in fact its going to be a rare case. But The way I am told to design is to keep the table "flat", so now I would be adding a Column called "Hemi Engine" to my table, instead of adding a row in my CarAttributes, and then only adding rows for the cars that have that as true.

Below is a snippet of the way I currently see approaching this problem, as opposed to doing a "flat" table (table with mostly columns of bits).

Question: What design makes more sense? Which is more maintainable? Am I completely crazy for thinking below is a better approach, and why?

CREATE TABLE #Car
(
    CarId INT,
    Name VARCHAR(250)
)

INSERT INTO #Car VALUES (1, 'Fusion')
INSERT INTO #Car VALUES (2, 'Focus')


CREATE TABLE #CarAttributes
(
    AttributeId INT,
    Name VARCHAR(250)
)

INSERT INTO #CarAttributes VALUES (1, 'Manual Transmission')
INSERT INTO #CarAttributes VALUES (2, 'SunRoof')


SELECT * FROM #CarAttributes

CREATE TABLE #CarSupportedAttributes
(
    AttributeId INT,
    CarId INT
)
INSERT INTO #CarSupportedAttributes VALUES (1,2)


--Determine if A Focus has a manual transmission
SELECT * FROM #Car c
INNER JOIN #CarSupportedAttributes csa
    ON csa.CarId = c.CarId
INNER JOIN #CarAttributes ca
    ON ca.AttributeId = csa.AttributeId 
WHERE c.Name = 'Focus'
    AND ca.AttributeId = 1

Upvotes: 1

Views: 158

Answers (1)

Adam Robinson
Adam Robinson

Reputation: 185643

Your approach is known as Entity-Attribute-Value, or EAV (yours is slightly modified, since in your model the presence of the attribute on the entity is the value, but the concept is the same).

EAV is usually considered an anti-pattern, but it can be appropriate in some cases. Basically, if either...

  • Your list of attributes is large and any given entity (car) will have only a small percentage of the total attributes
  • Your list of attributes is subject to frequent user change and they represent only data and not anything structural about the entity

Then EAV can be an appropriate choice. I can't answer either of those questions for you (though I have my suspicions), but it does seem like it might be appropriate in your case.

The other option, which is likely what most 6NF proponents would suggest, would be to have a table per attribute, like CarSunroof or CarManualTransmission. This would solve the first issue and the requirement of changing a table's definition whenever a new attribute is added, but would not address the issue of the user being able to change it.

Upvotes: 3

Related Questions