Reputation: 62652
Consider the following tables:
CREATE TABLE user_roles(
pkey SERIAL PRIMARY KEY,
bit_id BIGINT NOT NULL,
name VARCHAR(256) NOT NULL,
);
INSERT INTO user_roles (bit_id,name) VALUES (1,'public');
INSERT INTO user_roles (bit_id,name) VALUES (2,'restricted');
INSERT INTO user_roles (bit_id,name) VALUES (4,'confidential');
INSERT INTO user_roles (bit_id,name) VALUES (8,'secret');
CREATE TABLE news(
pkey SERIAL PRIMARY KEY,
title VARCHAR(256),
company_fk INTEGER REFERENCES compaines(pkey), -- updated since asking the question
body VARCHAR(512),
read_roles BIGINT -- bit flag
);
read_roles is a bit flags that specifies some combination of roles that can read news items. So if I am inserting a news item that can be read by restricted and confidential I would set read_roles to have a value of 2 | 4
or 6 and when I want to get back the news posts that a particular user can see I can use a query like.
select * from news WHERE company_fk=2 AND (read_roles | 2 != 0) OR (read_roles | 4 != 0) ;
select * from news WHERE company_fk=2 AND read_roles = 6;
What are disadvantages of using bit flags in database columns in general? I am assuming the answer to this question might be database specific so I am interested in learning about disadvantages with specific databases.
I am using Postgres 9.1 for my application.
UPDATE I got the bit about the database not being to use an index for bit operations which would require a full table scan which would suck for performance. So I have updated the question to reflect my situation more closely, each row in the database belongs to a specific company so all the queries will have WHERE clause that include a company_fk which will have an index on it.
UPDATE I only have 6 roles right now, possible more in the future.
UPDATE roles are not mutually exclusive and they inherit from each other, for example, restricted inherits all the permissions assigned to public.
Upvotes: 14
Views: 16720
Reputation: 5454
Adding to previous answers for SQL Server's implementation, you wouldn't save any space by having a single bitfield integer vs a pile of BIT NOT NULL
columns:
The SQL Server Database Engine optimizes storage of bit columns. If there are 8 or less bit columns in a table, the columns are stored as 1 byte. If there are from 9 up to 16 bit columns, the columns are stored as 2 bytes, and so on.
As JNK mentioned, partial comparisons on a bitfield integer would not be SARGable, so an index on a bitfield integer would be useless unless comparing the entire value at once.
On-disk indexes on SQL Server are based on sorting, so to get to the rows that have any particular bit set in isolation would require a separate index for each bit column. One way to save space if you are only looking for 1s is to make them filtered columns that only store the 1 values (zero values will not have an index entry at all).
CREATE TABLE news(
pkey INT IDENTITY PRIMARY KEY,
title VARCHAR(256),
company_fk INTEGER REFERENCES compaines(pkey), -- updated since asking the question
body VARCHAR(512),
public_role BIT NOT NULL DEFAULT 0,
restricted_role BIT NOT NULL DEFAULT 0,
confidential_role BIT NOT NULL DEFAULT 0,
secret_role BIT NOT NULL DEFAULT 0
);
CREATE UNIQUE INDEX ByPublicRole ON news(public_role, pkey) WHERE public_role=1;
CREATE UNIQUE INDEX ByRestrictedRole ON news(restricted_role, pkey) WHERE restricted_role=1;
CREATE UNIQUE INDEX ByConfidentialRole ON news(confidential_role, pkey) WHERE confidential_role=1;
CREATE UNIQUE INDEX BySecretRole ON news(secret_role, pkey) WHERE secret_role=1;
select * from news WHERE company_fk=2 AND restricted_role=1 OR confidential_role=1;
select * from news WHERE company_fk=2 AND restricted_role=1 AND confidential_role=1;
Both of those queries produce a nice plan with the random test data I produced:
As always, indexes should be based on actual query usage and balanced against maintenance cost.
Upvotes: 2
Reputation: 656982
If you only have a handful of roles, you don't even save any storage space in PostgreSQL. An integer
column uses 4 bytes, a bigint
8 bytes. Both may require alignment padding:
A boolean
column uses 1 byte. Effectively, you can fit four or more boolean columns for one integer
column, eight or more for a bigint
.
Also take into account that NULL
values only use one bit (simplified) in the NULL bitmap.
Individual columns are easier to read and index. Others have commented on that already.
You could still utilize indexes on expressions or partial indexes to circumvent problems with indexes ("non-sargable"). Generalized statements like:
database cannot use indexes on a query like this
or
These conditions are non-SARGable!
are not entirely true - maybe for some others RDBMS lacking these features.
But why circumvent when you can avoid the problem altogether?
As you have clarified, we are talking about 6 distinct types (maybe more). Go with individual boolean
columns. You'll probably even save space compared to one bigint
. Space requirement seems immaterial in this case.
If these flags were mutually exclusive, you could use one column of type enum
or a small look-up table and a foreign key referencing it. (Ruled out in question update.)
Upvotes: 12
Reputation: 116110
Disadvantages: Hard to write data, hard to read data, hard to debug, but especially: slow queries because the database cannot use indexes on a query like this.
Advantages, you save a few bytes. Compared to a BIT field, you may save a few MB on a million records table.. hardly worth it. :)
Upvotes: 10
Reputation: 65157
There is at least one huge disadvantage here...
These conditions are non-SARGable!
This is a big one and for me would be a dealbreaker. The bitwise evaluations you need to perform are (to my knowledge) not indexable in any database - the engine needs to check every row to perform the evaluation, which means terrible performance.
Upvotes: 5