Reputation: 14003
What would be a good/safe maximum length for a VARCHAR column as primary key not being much/any slower than an INTEGER ID using MySQL 5 + InnoDB on a 64 bit system? Note, that this PK should be assumed to be referenced by other tables, so it will appear in a number of JOINs.
Would a VARCHAR(7) be a good length? 6? 8? 10? More? Less? Why?
It might be hard to answer, but there should at least be an upper limit based on facts, e.g. based on the inner workings of MySQL/InnoDB (index structures, ... ?).
Edit: Assume the ASCII character encoding, case sensitive.
Upvotes: 1
Views: 1054
Reputation: 95522
What would be a good/safe maximum length for a VARCHAR column as primary key not being much/any slower than an INTEGER ID using MySQL 5 + InnoDB on a 64 bit system? Note, that this PK should be assumed to be referenced by other tables, so it will appear in a number of JOINs.
Not much slower to do what? SELECT? UPDATE? INSERT? My internal users want faster inserts; my web users want faster selects.
In part, performance (whatever that means) depends on your particular database structure, your particular query patterns, and your particular server hardware. What did your own tests show you?
It might be hard to answer, but there should at least be an upper limit based on facts, e.g. based on the inner workings of MySQL/InnoDB (index structures, ... ?).
If there were an upper limit, you wouldn't be able to count on it remaining unchanged across even minor version upgrades. And, of course, the query optimizer makes decisions at run time, not at design time. Basing long-term database design decisions on dbms internals as they exist today is not a Best Practice. (That's just an observation. I'm not implying that's what you're doing, but a lot of people who read this are liable to do just that.)
If you want to know how a particular set of tables perform, it makes sense to edit your question and include the DDL. That way we're at least talking about the same thing. As it is now, everybody that answers is probably going to be using a different structure. (If they bother to test at all.) And we might not reveal our private--and sometime unwarranted--assumptions.
In one specific case--from another SO question--using id numbers and joins took 100 times as long to execute as using a natural key. (32-bit PostgreSQL.) So people can talk all day about how many CPU instructions it takes to compare integers or strings, or the number of bytes in an integer, or the number of bytes in UTF-8 collations, or whatever. Nevertheless, in that specific case, VARCHAR(30) won by a landslide.
When I was in the military, we had a saying. "When your map and the terrain disagree, follow the terrain."
If theory and measurements disagree, follow the measurements. Develop rules of thumb from measurements.
Upvotes: 2
Reputation: 360572
Using an int field is generally preferable for keys. Two numbers can be compared in a single assembler instruction on pretty much any platform in existence. Comparing two strings will invariably require a loop and extra setup steps in advance, or take multiple cpu cycles even if the cpu has string comparison instructions built in.
Upvotes: 3
Reputation: 62359
Anything larger than VARCHAR(4) in ASCII encoding (and ascii_bin collation - you don't want case insensitive collation for relational operations) will be slower than INT (because INT is 4 bytes long)
Upvotes: 2