mpellegr
mpellegr

Reputation: 3182

Using text as a primary key in SQLite table bad?

Is it bad to have text as a primary key in an SQLite database? I heard that it's bad for performance reasons, is this true? And will the rowid be used as the actual primary key in such a case?

Upvotes: 48

Views: 56278

Answers (7)

Zoë Sparks
Zoë Sparks

Reputation: 340

There's nothing intrinsically wrong with using a text primary key. What makes a primary key work is that it is orderable and has unique values in the table; other than that, the type of the data doesn't strictly matter. However, when the data is text, often the "real-world" source of the data rules out the practicality of using it as the primary key. Using arbitrary, meaningless integers for the primary key means you don't have to worry about stuff like that.

That's probably the nicest thing about integer primary keys, arguably moreso than the speed of integer vs. string comparison. String comparison is generally a little more work for the computer than integer comparison, true, but that's unlikely to matter much in this context. SQLite creates an index for the primary key in any table, which means that even if you have a million entries, SQLite will only need to perform around 13 comparisons at worst to find the row (O(log n)). It would take a really unusual case for that to have major performance implications, I'd say.

On that note, something thing you might consider if you're planning on using a text primary key is to use SQLite's WITHOUT ROWID feature. A table with a text primary key is unlikely to need a rowid column, because the rowid is essentially an integer primary key. WITHOUT ROWID not only eliminates the rowid column, but also tells SQLite to base the search tree for the table itself on the primary key you specify instead of the rowid. Otherwise, it will create two search trees, the main search tree for the table itself using rowid keys and a separate search tree for the text primary key associating text values with rowids. This wastes space and adds needless overhead to lookups using the text primary key, presuming you have no need for the rowid.

SQLite's docs for WITHOUT ROWID explain all this stuff. They give an example of a table storing word counts in a text corpus with the word as the primary key, which seems to me like a nice example of a situation where a text primary key makes sense.

Upvotes: 7

Ken Lin
Ken Lin

Reputation: 1919

Although this thread discusses INTEGER vs TEXT primary keys, for context, see Blob vs. Text for primary keys circa 2021 where SQLite creator Richard Hipp replies. I've pasted and emphasized the relevant portion of his reply below.

(2) By Richard Hipp (drh) on 2021-03-04 16:00:22 in reply to 1 [source]

Both approaches should work fine. Storing the hash as a BLOB might be very slightly faster, since (as you observe) there is less content, hence less file I/O.

The Fossil version control system does something very much like this. But it stores the hash as text rather than as a blob. Performance is not an issue, and text is easier for developers to deal with when debugging.

Upvotes: 3

Sergio Cabral
Sergio Cabral

Reputation: 6966

A field of type PRIMARY KEY implies comparing values. Comparing a number is simpler than comparing a text.

The reason is that there is a specific assembly instruction for 64 bit numeric comparison. This will always be much faster than comparing text which in theory can be unlimited in size.

Example comparing number:

CMP DX, 00  ; Compare the DX value with zero
JE  L7      ; If yes, then jump to label L7
.
.
L7: ...

Read more about CMP assembly instruction here: https://www.tutorialspoint.com/assembly_programming/assembly_conditions.htm

Knowing this allows us to know that numbers will always be more performative (at least in the computing we have today).

Upvotes: -5

Bipin
Bipin

Reputation: 352

Yes, if you use TEXT you get android.database.sqlite.SQLiteConstraintException: UNIQUE constraint failed: TableName.ColumnName (code 1555)

SQLite has session to insert and return the row ID of the last row inserted, if this insert is successful. else will return -1.

return is mapped to _ID , this is the reason they force you interface BaseColumns for the table

its strange that insert call has to return the rowid, instead of a boolean or so

I wish TEXT PRIMARY KEY capability was there in sqlite

Upvotes: -3

Segabond
Segabond

Reputation: 1143

In real world, using strings as primary key has a lot of benefits if we are talking about UUIDs. Being able to create entity "passport" exactly at the moment of its creation can massively simplify asynchronous code and/or distributed system (if we are talking about more complex mobile client / server architecture).

As to the performance, I did not find any measurable difference when running a benchmark to perform 10000 primary key lookups, as in reality, database indexes neither store nor compare strings when running indexed searches.

Upvotes: 41

laalto
laalto

Reputation: 152817

Is it bad to have text as a primary key in an SQLite database? I heard that it's bad for performance reasons, is this true?

From correctness point of view, TEXT PRIMARY KEY is all right.

From performance point of view, prefer INTEGER keys. But as with any performance issue, measure it yourself to see if there's a significant difference with your data and use cases.

And will the rowid be used as the actual primary key in such a case?

Only INTEGER PRIMARY KEY gets aliased with ROWID. Other kinds of primary keys don't, and there will be the implicit integer rowid unless WITHOUT ROWID is specified. Reference.

Upvotes: 57

Simon Dorociak
Simon Dorociak

Reputation: 33505

Is it bad to have text as a primary key in an SQLite database? I heard that it's bad for performance reasons, is this true?

I never heard that somebody used string as primary key in table. For me (and I honestly hope also for others) very "ugly" practise with very low performance.

If you'll use string as primary key you needs to think about a "few" things:

  • Will be combination of 3 symbols enough?
  • Or should I use 5 symbols?

Here, each row must have same format (readability issue of course) and also be unique. Oh! Here is next "piggy work" -> you'll need to create some "unique string generator" that will generate unique1 string identificator2.

And also there are next issues is good to consider:

  • Longer strings = automatically harder and harder to compare
  • Size of table radically raises because it's pretty clear that string has much more size as number
  • Number of rows - it's madness to use string as primary key if you table can have 1000+ rows

It's more complex theme but i would like to say that OK, for very small tables would be possible to use strings as primary key (if it makes a sence) but if you'll look at disadvantages it's much more better technique to use number as primary key for sure!

And what is conclusion?

I don't recommend you to use string as primary key. It has more disadvantages as advantages (it has really some advantage?).

Usage of number as primary key is much more better (I'm scared to say the best) practise.

And will the rowid be used as the actual primary key in such a case?

If you will use string as primary not.

1In real strings are rarely unique.

2Of course, you could say that you can create identificator from name of item in row, but it's once again spaghetti code (items can have same name).

Upvotes: -46

Related Questions