derRobert
derRobert

Reputation: 571

mySQL SELECT rows where a specific bit of an integer is set

i have to do a select query in a posting table where a specific bit of an integer is set. The integer represents a set of categories in a bitmask: E.g.

1 => health
2 => marketing
3 => personal
4 => music
5 => video
6 => design
7 => fashion
8 => ......

Data example:

id | categories | title
1  | 11         | bla bla
2  | 48         | blabla, too

I need a mysql query that selects postings, that are marked with a specific category. Let's say "all video postings" This means i need a result set of postings where the 5th bit of the catgories column is set (e.g. 16,17,48 ....)

SELECT * FROM postings WHERE ....????

Any ideas ?

Upvotes: 13

Views: 20163

Answers (3)

Panagiotis Kanavos
Panagiotis Kanavos

Reputation: 131774

SQL (not just mySQL) is not suitable for bitwise operations. If you do a bitwise AND you will force a table scan as SQL will not be able to use any index and will have to check each row one at a time.

It would be better if you created a separate "Categories" table and a properly indexed many-to-many PostingCategories table to connect the two.

UPDATE

For people insisting that bitmap fields aren't an issue, it helps to check Joe Celko's BIT of a Problem.  At the bottom of the article is a list of serious problems caused by bitmaps.

Regarding the comment that a blanket statement can't be right, note #10 - it breaks 1NF so yes, bitmap fields are bad:

  1. The data is unreadable. ...
  2. Constraints are a b#### to write....
  3. You are limited to two values per field. That is very restrictive; even the ISO sex code cannot fit into such a column...
  4. There is no temporal element to the bit mask (or to single bit flags). For example, a flag “is_legal_adult_flg” ... A DATE for the birth date (just 3 bytes) would hold complete fact and let us compute what we need to know; it would always be correct, too. ...
  5. You will find out that using the flags will tend to split the status of an entity over multiple tables....
  6. Bit flags invite redundancy. In the system I just mentioned, we had “is_active_flg” and “is_completed_flg” in in the same table. A completed auction is not active and vice verse. It is the same fact in two flags. Human psychology (and the English language) prefers to hear an affirmative wording (remember the old song “Yes, we have no bananas today!” ?). All of these bit flags, and sequence validation are being replaced by two sets of state transition tables, one for bids and one for shipments. For details on state transition constraints. The history of each auction is now in one place and has to follow business rules.
  7. By the time you disassemble a bit mask column, and throw out the fields you did not need performance is not going to be improved over simpler data types. 
  8. Grouping and ordering on the individual fields is a real pain. Try it.
  9. You have to index the whole column, so unless you luck up and have them in the right order, you are stuck with table scans.
  10. Since a bit mask is not in First Normal Form (1NF), you have all the anomalies we wanted to avoid in RDBMS.

I'd also add, what about NULLs? What about missing flags? What if something is neither true or false?

Finally, regarding the compression claim, most databases pack bit fields into bytes and ints internally. The bitmap field doesn't offer any kind of compression in this case. Other databases (eg PostgreSQL) actually have a Boolean type that can be true/false/unknown. It may take 1 byte but that's not a lot of storage and transparent compression is available if a table gets too large.

In fact, if a table gets large the bitmap fields problems become a lot more serious. Saving a few MBs in a GB table is no gain if you are forced to use table scans, or if you lose the ability to group

Upvotes: 1

Mike Christensen
Mike Christensen

Reputation: 91724

How about

SELECT * FROM postings WHERE (categories & 16) > 0; -- 16 is 5th bit over

One issue with this is you probably won't hit an index, so you could run into perf issues if it's a large amount of data.

Certain databases (such as PostgreSQL) let you define an index on an expression like this. I'm not sure if mySQL has this feature. If this is important, you might want to consider breaking these out into separate Boolean columns or a new table.

Upvotes: 5

Marcus Adams
Marcus Adams

Reputation: 53880

You can use bitwise operators like this. For video (bit 5):

WHERE categories & 16 = 16

Substitute the value 16 using the following values for each bit:

1 = 1
2 = 2
3 = 4
4 = 8
5 = 16
6 = 32
7 = 64
8 = 128

This goes from least significant bit to highest, which is opposite of the way most programmers think. They also start at zero.

Upvotes: 19

Related Questions