Hernantz
Hernantz

Reputation: 566

SQL: best way to store yes/no values? Looking after performance in huge databases

I have some columns where I have to store basically yes/no values. For example user status for active or inactive. Newsletter suscription status for suscribed or unsuscribed.

Well I want to know (considering tables with a lot of records) if the best way is to put a tiny int with char length of 1 and set 1 for yes, and 0 for no.

Is this a correct thought? Or there are no impact in the performance of db queries when using just words like yes, no, active, inactive, suscribed, etc.

thanks in advance.

Upvotes: 4

Views: 18457

Answers (6)

Ronnis
Ronnis

Reputation: 12833

Are you just asking in general, what the most efficient way to store a yes/no flag is? Or do you have a performance problem at hand?

If so, when do you have the performance problem (specific queries, inserts, maintenance etc)? What kind of performance gain are you looking for? 2%? 10%? 50%?

Changing datatypes will likely result in only a minor improvement unless we are talking about several hundred million rows. I will give you an example. Let's say that whatever change you did, you shaved of 3 bytes per row. Let's say the table contains 100,000,000 rows. That would be a saving of ~285 mb. Assuming the disk subsystem can provide you 100mb/s you have saved a whopping 3 seconds for a full table scan. Something tells me that the users would think 2 hours and 3 seconds vs 2 hours is same same :)

Upvotes: 1

Pavel Urbančík
Pavel Urbančík

Reputation: 1496

If your RDBMS supports bitmap indexes, go for BIT every time. If it doesn't, use whatever you want, there is really no difference between char(1), tinyint (byte).

Upvotes: 1

ClosureCowboy
ClosureCowboy

Reputation: 21541

Semantically, I suggest you use bit if it's available to you. When looking at the column, any other developer can immediately determine that a boolean value is stored in it. If you don't have bit, try using tinyint. Ensuring that 1 is the only true value and 0 is the only false value will bring consistency. Otherwise, you could end up with a messy mixture of true/false, yes/no, valid/invalid, y/n, and/or t/f.

Comparing bit or tinyint values probably isn't slower than comparing strings, and even if it were slower than comparing strings, I can't imagine it having a significant effect on overall speed.

Upvotes: 9

Chase
Chase

Reputation: 69201

My intuition would have said performance would have been better with tinyints, but this post doesn't really bare that thought out. This SO post also offers some other interesting opinions.

I do think that performing analysis with data stored as numbers is typically easier than character data. What other programs are you going to have to interface with and use? For example, several of my analysis tools do not read character data at all, so we have to recode any data we receive in the format of "yes", "no", etc.

Upvotes: 0

OMG Ponies
OMG Ponies

Reputation: 332591

The most commonly supported means is to use CHAR(1) - on most databases, it takes the same amount of space as BIT (assuming BIT is available, 1 byte) but supports more values (26 if case insensitive, 52 if not) if there's any chance of supporting more values. Unlike BIT, CHAR(1) is human readable. Also, BIT isn't supported on every database.

Upvotes: 3

John Saunders
John Saunders

Reputation: 161783

Is there something you don't like about the 'bit' data type?

Upvotes: 3

Related Questions