Nicster15
Nicster15

Reputation: 169

MySQL/PostgreSQL Column Sizes, Why?

I'm developing a program and ran into a bug where inserting a value in a tables column, that has the type int, and the value is larger than Integer.MAX_VALUE it spits out an error saying the number is too large. I read that the fix for this is to quite simply just alter the table to BigInt and that should fix it. But that made me thinking, why don't all programmers just use the max column values (such as Varchar(255), BigInt, etc.) rather than something smaller like Varchar(30) or Int?

Wouldn't this almost completely eliminate an error like mine occurring when you're not sure of whats going to be inserted, especially if it's based off of users input? Is there any cons into just using the largest possible type you need for the columns? Would the table size be bigger even if you just "2" in a big int column (even though that would work with int?). Is there a performance loss?

Thanks!

Upvotes: 1

Views: 269

Answers (3)

Dan Guzman
Dan Guzman

Reputation: 46213

why don't all programmers just use the max column values (such as Varchar(255), BigInt, etc.) rather than something smaller like Varchar(30) or Int?

Some do exactly that. It's also not at all uncommon to see developers store numeric or date/time values in varchar columns too.

I often see performance and storage costs called out as the reason not to do this. Those are considerations (which vary by DBMS) but a more important one in the world of relational databases is data integrity. The chosen datatype is a critical part of the data model because it determines the domain of data that can be stored. On top of that, relational databases provide check, referential, and NULL constraints to further limit column values.

Wouldn't this almost completely eliminate an error like mine occurring when you're not sure of whats going to be inserted, especially if it's based off of users input?

Of course, but why stop at a 64-bit integer? Why not NUMERIC(1000)? That's a rhetorical question to point out that one must know about the business domain so data can be properly modeled and validation rules enforced. A 64-bit integer is certainly overkill to store a person's number of children but you may end up with a value of several billion due to careless data entry. The column data type is the last defense for bad data and is especially important when it's based off of users input.

All that being said, one can use a RDBMS as nothing more than a dumb storage engine and enforce data integrity rules (if any) in application code. In that case, storage and performance are the only consideration.

Upvotes: 2

Jason A. Long
Jason A. Long

Reputation: 4442

I can't speak to any RDBMS other than SQL Server (but I imagine this applies to all of them)... A BIG INT takes up twice as much space as an INT... which means less data fitting onto a page meaning less data in cache meaning slower performance.

In SQL Server there are actually 4 INT types:

TINYINT (1 byte),

SMALLINT (2 bytes),

INT (4 bytes),

BIGINT (8 bytes).

A good database developer will put very careful thought into choosing the proper data type based on the data that's expected to be put in the column. Aside from the issue of storage space, data types function as data constraints. So if I choose TINYINT as my data type, that means I only expect to see values between 0 and 255 and will reject anything that falls outside of that range.

If a coworker were to submit a table design with all VARCHAR(255) & BIGINTs, I'd reject it and have them size everything appropriately. It's lazy thinking like that, that causes huge problem on the DB side of the house.

Upvotes: 2

Xedni
Xedni

Reputation: 4695

For Varchar, the reason you generally don't just use MAX is because it stores it differently and puts limitations on your index maintenance operations. For instance, you cannot rebuild an index "online" with a varchar(max) field on it. While there's a little hand waving involved, basically varchar(max) data gets stored off row so there's overhead in maintaining that extra data store.

For numeric types, the main thing is space. Bigint is an 8 byte signed integer whereas an int is only 4 bytes. If you dont need a space bigger than 2.4 billion, that's just wasted space (and often a lot of it if you have, say, 2.4 billion rows of data).

Data Compression can solve some of those issues, but not without the cost of having to de-compress the data when it's queried.

So the reasons are varied, but with the possible exception of using larger size varchars (not varchar(max)), picking the "right" data type for your data is just a good idea.

Upvotes: 4

Related Questions