Reputation: 4014
Quick question. Does it matter from the point of storing data if I will use decimal field limits or hexadecimal (say 16,32,64 instead of 10,20,50)?
I ask because I wonder if this will have anything to do with clusters on HDD?
Thanks!
Upvotes: 15
Views: 14968
Reputation: 7335
If it would be a C-Program I'd spend some time to think about that, too. But with a database I'd leave it to the DB engine.
DB programmers spent a lot of time in thinking about the best memory layout, so just tell the database what you need and it will store the data in a way that suits the DB engine best (usually).
If you want to align your data, you'll need exact knowledge of the internal data organization: How is the string stored? One, two or 4 bytes to store the length? Is it stored as plain byte sequence or encoded in UTF-8 UTF-16 UTF-32? Does the DB need extra bytes to identify NULL or > MAXINT values? Maybe the string is stored as a NUL-terminated byte sequence - then one byte more is needed internally.
Also with VARCHAR it is not neccessary true, that the DB will always allocate 100 (128) bytes for your string. Maybe it stores just a pointer to where space for the actual data is.
So I'd strongly suggest to use VARCHAR(100) if that is your requirement. If the DB decides to align it somehow there's room for extra internal data, too.
Other way around: Let's assume you use VARCHAR(128) and all things come together: The DB allocates 128 bytes for your data. Additionally it needs 2 bytes more to store the actual string length - makes 130 bytes - and then it could be that the DB aligns the data to the next (let's say 32 byte) boundary: The actual data needed on the disk is now 160 bytes 8-}
Upvotes: 3
Reputation: 328624
Yes but it's not that simple. Sometimes 128 can be better than 100 and sometimes, it's the other way around.
So what is going on? varchar
only allocates space as necessary so if you store hello world
in a varchar(100)
it will take exactly the same amount of space as in a varchar(128)
.
The question is: If you fill up the rows, will you hit a "block" limit/boundary or not?
Databases store their data in blocks. These have a fixed size, for example 512 (this value can be configured for some databases). So the question is: How many blocks does the DB have to read to fetch each row? Rows that span several block will need more I/O, so this will slow you down.
But again: This doesn't depend on the theoretical maximum size of the columns but on a) how many columns you have (each column needs a little bit of space even when it's empty or null
), b) how many fixed width columns you have (number
/decimal
, char
), and finally c) how much data you have in variable columns.
Upvotes: 2
Reputation: 754060
VARCHAR(128) is better than VARCHAR(100) if you need to store strings longer than 100 bytes.
Otherwise, there is very little to choose between them; you should choose the one that better fits the maximum length of the data you might need to store. You won't be able to measure the performance difference between them. All else apart, the DBMS probably only stores the data you send, so if your average string is, say, 16 bytes, it will only use 16 (or, more likely, 17 - allowing 1 byte for storing the length) bytes on disk. The bigger size might affect the calculation of how many rows can fit on a page - detrimentally. So choosing the smallest size that is adequate makes sense - waste not, want not.
So, in summary, there is precious little difference between the two in terms of performance or disk usage, and aligning to convenient binary boundaries doesn't really make a difference.
Upvotes: 12