tremby
tremby

Reputation: 10109

Is an index on a SET column useful?

The following question is framed with a particular lean towards MySQL and PostgreSQL, but I'd also be interested in answers regarding other database systems.

I'm designing a database and the SET column type appears to fit the bill in a few cases. One such example could be expressed as a boolean column for each day of the week, and I'm thinking of instead using MySQL's SET, SET('Sun','Mon','Tue','Wed','Thu','Fri','Sat').

Is an index on such a SET column useful? Would it speed up searches for rows matching individual days of the week? Particular combinations of days of the week? Or would it only speed up searches for full exact binary values of the field (such as 0101010 for Mon/Wed/Fri)?

Upvotes: 1

Views: 122

Answers (2)

Rick James
Rick James

Reputation: 142560

MySQL A SET is implemented as an INT UNSIGNED of some length up to 8 bytes (64 items). The comments on the reference manual have lots of examples. Included are examples of how to treat the SET as the bits it is composed of.

Just as you cannot index "parts" of a number, you cannot really index parts of a SET.

SET('Sun','Mon','Tue','Wed','Thu','Fri','Sat') is notational convenience for a 7-bit number. And using 'Mon,Wed,Fri' to set 3 of the bits is debatably also a notitional convenience. Turning off a bit is really messy unless you think in bits and INTs and powers of 2.

If you don't already understand how binary numbers are composed of bits, then you will probably find SETs to be very hard to use.

There is one case where an INDEX might be worth having -- "covering". That is, an index that contains all the columns mentioned in a SELECT will probably run that SELECT faster. Example:

SELECT item FROM tbl WHERE FIND_IN_SET('Mon', my_set);
-- together with
INDEX(my_set, item)

That index will probably speed up finding the items that include Mondays. Scanning the "covering" index is likely to be faster than scanning the table.

Upvotes: 0

Evan Carroll
Evan Carroll

Reputation: 1

Using PostgreSQL

Logically, if you wanted to only test for = the binary solution is the fastest. But, that's not to useful.

If not, you're probably better storing them as

  1. an array of enum,
  2. just simply as individual boolean fields. You can even use a bloom index.

In PostgreSQL you can create an enum type and then have an array of enum types. An index will speed this up.

CREATE TYPE dow AS ENUM ('M', 'Tu', 'W', 'Th', 'F', 'Sa', 'Su' );
CREATE TABLE foo ( days dow[] );

This would permit you to find all available Mondays with

SELECT * FROM foo WHERE days @> ARRAY['M']::dow[];

Or, all Monday, Wednesday, and Friday

SELECT * FROM foo WHERE days @> ARRAY['M','W','F']::dow[];

Or you could make them bools, index them, and then do

SELECT * FROM foo WHERE has_monday AND has_wednesday AND has_friday; 

Upvotes: 2

Related Questions