Candy Chiu
Candy Chiu

Reputation: 6679

SQL Server: UNION ALL vs Aggregate all columns into one table

I have a question on the performance of two designs. The goal is to store multiple types of entities where they share some attributes but also differ.

Approach 1: Multiple tables, each modelling one entity

Entity1 - C1, C2, C3
Entity2 - C1, C2, C4
Entity3 - C1, C2, C5    

To query, I need to perform a UNION ALL on all the tables.

Approach 2: Single table with all the columns and a type column

All - Type, C1, C2, C3, C4, C5

Here, I can query directly on the columns.

The question is does the UNION ALL approach have any performance issues? This question is similar to previously asked question on PostsgreSQL, which has not been answered.

EDIT:

Thank you for all answers.

The entity table are date indexed. And the queries are most of time date filtered, or shared fields filtered. Suppose C1 is a date, C2 is a string, 95% of the queries look like C1>=from and C1<=to, or C2='SomeId'.

Number of records grows slowly, maybe a few hundred per entity per day. Number of columns won't grow beyond 150. However, the number of shared columns is small. currently I have implemented Approach 1 because each entity may use fields other than the shared as primary key. This way the constraints are more natural.

Upvotes: 0

Views: 888

Answers (3)

i-one
i-one

Reputation: 5120

As an alternative, you may combine approaches 1 and 2, i.e. you may create "ancestor" table:

All - ID, Type, C1, C2

And three "descendant" tables, where ID is PK and at the same time it is FK to ID of All table:

Entity1 - ID, C3
Entity2 - ID, C4
Entity3 - ID, C5

Upvotes: 0

ChrisProsser
ChrisProsser

Reputation: 13088

How many rows are you planning to have roughly? I have experience of working with a large table like this where they went for the single table approach and it is very slow to get any data back unless you are hitting one of the indexes (table is approx 250 columns by almost 1 billion rows).

Because of the number of columns it is not practical to build an index for every common filtering criteria as this would slow down inserts considerably on a transactional system. This example would certainly be a lot easier if the tables were separate and we perhaps had a view to put them together for occasions when we had to query all of the data together.

However, I am concious that there are a lot of variables to consider. If you are working with a database that is primarily used for OLAP rather than OLTP then you may not have any concerns about adding a lot of indexes for example.

Upvotes: 1

HLGEM
HLGEM

Reputation: 96590

In making this choice it depends greatly on how wide the table would need to be, if there are any shared columns, how large the tables will be, what kind of queries you will be performing against the tables, etc.

As a rule of thumb, do not put into one table if the table width will be anywhere close to the maximum width the database supports for a record. Less wide tables tend to perform better. If there are very few columns you are talking about, this is likely the best solution.

If the common columns will be the ones most commonly queried, then consider designing a parent table with the common columns and three child tables for the type specific ones.

If there are very few common columns and types will most likely usually be queried by themselves (Type a and Type B would not generally both be in the result set in the most frequently run types of queries), then separate tables with a view that does the UNION all for the few times you need to query all of them will work.

If you only need to query all types for reporting but not all of the ordinary day-to-day stuff, consider having separate tables and a data warehouse for reporting.

Upvotes: 2

Related Questions