Reputation: 95
We have a database with articles, that, IMO could be improved from a column size perspective (we have more than 170 columns) There are lots of columns, were we only store some bools (1 or 0). Instead of having something like COLUMN X, Z, Y, N separately, would it be more optimal to merge everything in one column, called "XZYN", with values, where each number represents the status of XZYN.
Example : 1000 would mean X= true, ZYN = false. Of course, this state would be interpreted from our code.
Is this a good idea ?
Upvotes: 0
Views: 829
Reputation: 95561
From a storage point of view, storing the data in a single column like that would "cost" far more. A bit
column (I assume you mean a bit
when you say "bool") is very small in size, where as to store a value like 1000
you would likely want an int
. An int
is 4 bytes in size where as a bit
is (unsurprisingly) only 1 bit in size, and muliple columns are grouped into sets of 8.
The SQL Server Database Engine optimizes storage of bit columns. If there are 8 or fewer bit columns in a table, the columns are stored as 1 byte. If there are from 9 up to 16 bit columns, the columns are stored as 2 bytes, and so on.
This means that if you have 100 bit
columns, to store it as a concatenated string you would need 10 int
columns or 6 bigint
columns, taking up 40 or 48 bytes respectively. For 100 bit
columns you would only be using 13 bytes (100 / 8 = 12.5 = 13 1 byte groups).
Storing the data in a single column would also not be SARGable, and searching it would not be simple. You can't divide the column or get the remainder, as other "columns" would impact the division and remainder values. Instead you'd have to use something like SUBSTRING
after adding any needed leading zeros to get the relevant character, which is quite "ugly" in my opinion.
An alternative solution, however, (though I also don't recommend) is to use bitwise logic. This is where you assign each bit value a different multiple and then aggregate them, and then use the bitwise operator to extract the value for the "column". Let's, for example, say you have 8 columns, A-H. You would assign each of these a digit in a 8 bit binary value:
a = 1 = 2^0
b = 2 = 2^1
c = 4 = 2^2
d = 8 = 2^3
e = 16 = 2^4
f = 32 = 2^5
g = 64 = 2^6
h = 128 = 2^7
So if a row wanted to have a true value for a, c, f, and g, the stored value would be 1+4+32+64 = 101. You can then check if the value is true for that value, using the bitwise (&
) operator:
SELECT CASE V.I & 1 WHEN 0 THEN 0 ELSE 1 END AS A,
CASE V.I & 2 WHEN 0 THEN 0 ELSE 1 END AS B,
CASE V.I & 4 WHEN 0 THEN 0 ELSE 1 END AS C,
CASE V.I & 8 WHEN 0 THEN 0 ELSE 1 END AS D,
CASE V.I & 16 WHEN 0 THEN 0 ELSE 1 END AS E,
CASE V.I & 32 WHEN 0 THEN 0 ELSE 1 END AS F,
CASE V.I & 64 WHEN 0 THEN 0 ELSE 1 END AS G,
CASE V.I & 128 WHEN 0 THEN 0 ELSE 1 END AS H
FROM (VALUES(101))V(I);
This, however, is again not SARGable but at least uses far less storage that storing a value like 10100110
. If, however, you are never going to be filtering on the column in the WHERE
, then this might be worth exploration, but if there is even a chance you might, then don't (though a mixture on both bit
and bitwise columns, which don't need to be filtered on, might not be "bad" to reduce the column count).
My honest opinion, stick to the columns as they are. If the table is really "too wide", consider separating the bit
columns groups and put them into a separate tables, with a 1 to 1 relationship to your current table.
Upvotes: 2
Reputation: 1269773
Is this a good idea? Probably not.
You are trying to over-optimize the database. The additional expense is to parse any resulting columns into what you really need. That parsing adds overhead. More importantly, it makes the database harder to use.
It is worth pointing out that bit-packing (which is what I'll call what you want to do) does have some advantages, primarily in space savings. It is important to remember as well that less space means the database is faster.
If you have 30 such columns and they are currently stored as ints, then that is 120 bytes for the values plus and additional 30 bits for the corresponding NULL
bits. You could back these into four bytes and one NULL
bit -- a significant savings.
However, you can store these as tinyint
/char(1)
or even bit
. That would reduce the size to 30 bytes or even 1 byte -- although you will will have the NULL
bits take up 30 bits of space. In other words, you can get basically the same effect by switching types.
Or, you could remove all these columns altogether and just have another table with "attributes". Then you could store one row per entity and when the attribute is true, using the name (or a reference table). For instance, instead of:
entityid flag1 flag2 flag3
1 1 0 1
You would have another table with:
entityid flag
1 'flag1' -- or a reference to "flag1"
1 'flag2'
This approach has several advantages:
Upvotes: 1
Reputation: 2535
Would anyone looking only at the DB ever know what those values mean, without the code to interpret it?
This would also make it very difficult to add or remove any of those flags - especially if you're trying to remove something that sits in the "middle" of the concatenated string.
If it really is messy, an alternative might be to extract the flags into another table that has a one-to-one relationship with the current master records. However, I don't know your data model well enough to really know whether or not that will be viable.
Are you trying to address an issue around performance or around readability?
Upvotes: 1