Reputation: 150108
I have an object (happens to be C#) with about 20 properties that are nullable booleans. There will be perhaps a few million such objects persisted to a SQL database (currently SQL Server 2008 R2, but MySQL may need to be supported in the future). The instances themselves are relatively large because they contain about a paragraph of text as well as some other unrelated properties.
For a given object instance, most of the properties will be null most of the time.
When users search for instances of such objects, they will select perhaps 1-3 of the nullable boolean properties and search for instances where at least one of those 1-3 properties is non-null (OR search).
My first thought is to persist the object to a single table with nullable BIT columns representing the nullable boolean properties. However, this strategy will require one index per BIT column to avoid performing a table scan when searching. Further, each index would not be particularly selective since there are only three possible values per index.
Is there a better way to approach this problem?
Upvotes: 2
Views: 134
Reputation: 1269943
For performance reasons, I would suggest that you split the table into two tables.
Create a primary key with the bit fields used for indexes. Have another table that has the additional data (such as the paragraph). Use the first for WHERE conditions, joining in the second to get the data you want. Something like:
select p.*
from BitFields bf join
Paragraph p
on bf.bfid = p.bfid
where <conditions on the bit fields>
With a bunch of binary/ternary fields, I wouldn't think that indexes would help much, so the query engine will be resorting to a full table scan. If you put in the bit fields in one table, you can store the table in memory, and achieve good performance.
The alternative is to store the fields as name value pairs. If you really have lots of such fields (say many hundreds or thousands) and only a few are used in a given row (say a dozen or so), then an entity-attribute-value (EAV) structure might work better. This is a table with three important columns:
Upvotes: 1