Leo Jiang
Leo Jiang

Reputation: 26243

Why do database schemas often contain 32, 64, 128, etc

I understand that 2, 4, 8, 16, 32, 64, 128, 256... are the decimal equivalents of binary digits.

Is there a reason why these are used in databases? For example, VARCHAR fields are often 255 characters long. Since (I'm assuming) each character is one byte, why is there a difference between using 255 characters and using 257 characters?

Upvotes: 11

Views: 11162

Answers (4)

Aaron Bertrand
Aaron Bertrand

Reputation: 280644

It is more habit than anything. There is nothing magic about varchar(32) or varchar(64), similarly there is nothing magic about the defaults the visual tools try to make you use instead (e.g. varchar(50)). A lot of these upper bounds have been ingrained into people's heads since 640k would be enough memory for anyone and we really needed to worry about every single byte.

In a lot of cases it comes down to a common ground. In a previous system I worked in the product managers had no idea what their requirements were. They wanted to store a name, but they didn't know what the domain of names really consisted of - but one of them stated that they had heard of a last name > 50 characters, so he knew it had to be more than 32 and more than 50. We came back with 64, he agreed that was enough, and that is what is still there today AFAIK.

Though we did have a technical reason for e-mail (varchar(320)), which at the time the standard dictated as 320 characters because 64 characters for username/localpart, 255 characters for domain name, and 1 character for @. Most other decisions were based on precedence (e.g. all subsequent names followed the nvarchar(64) model as decided above), or logic (e.g. URLs don't need to be nvarchar(max), but depending on the standard and browser capabilities at the time, they were I believe either varchar(2048) or varchar(4096). In that case not because it was a power of 2, but because someone else's software or standards built their stuff to use a power of 2.

Upvotes: 1

Bohemian
Bohemian

Reputation: 425428

With varchar columns, the length is stored with the data using unsigned integers in the leading bytes of the data. The fewest number of bytes is used; one byte can store lengths from 0 to 255, two bytes from 0 to 65535, etc. By making the length 255, you get the "most value" out of the minimum one length byte.

In days gone by, single bytes of disk saved per row were worth saving. Although now disk is cheap, the thinking has remained, especially by grey-haired DBAs.

There is no advantage in choosing a length that is a power of 2, for example varchar(64) - it is merely a habit/convention (I even follow it - and I don't know why!).

Upvotes: 12

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 727077

The data in databases is often organized in pages. These pages are almost universally aligned with memory boundaries for memory and cache management. Choosing 2^n sizes for your data is good to optimize the use of space in your database.

Note: Depending on the RDBMS engine, 256 may not be the best choice for variable-length strings from the memory alignment perspective, because the length of the string takes space as well, i.e. a varchar(256) takes up 258 bytes.

Upvotes: 1

Dave Griffith
Dave Griffith

Reputation: 20515

Not merely database schemas but pretty much any programming artifact will by found to contain many numbers of the form 2^N or 2^N-1. While some of these uses make sense (e.g. 2^32-1 being the largest number representable as a standard unsigned integer in many machine architectures), most uses of powers of 2 are less necessary. In practice, old hackers view powers of 2 as holy, and venerate them such.

Upvotes: 1

Related Questions