localhost
localhost

Reputation: 1273

Database table design - Are there any problems with using a long column as a primary key?

Forgive me for what is probably a stupid or obvious question - I'm new to databases.

I'm planning to store file path links to on-disk media files in a Derby database from java but I'm curious about the best way to set up the tables.

Just to clarify I do not intend to store the actual media in the database, only file paths.

The table will contain in the order of 10k-100k rows.

I believe that the file path should be the primary key as it uniquely identifies each media file.

What are the best options for setting up a table with file paths and to be able to efficiently search (mostly for a substring in the filename, but also for media attributes)?

I am planning to use VARCHAR(4096) as maximum linux path length is 4096 characters.

Are there any pros or cons in creating a table in this way, with an index on what could be quite a long VARCHAR column? How do you suggest I should design the tables?

Thanks!

Upvotes: 0

Views: 150

Answers (2)

The Impaler
The Impaler

Reputation: 48770

Disclaimer: This is a very personal opinion and probably many people will disagree.

You are considering using a "natural key", and I'm against using them. A natural key is an existing property of an object that identifies it uniquely... until it doesn't.

It's like my full name, of my identity number in my country. Those properties seem to be unique, but the problem is that they are not stable. They are existing, known properties that are visible; this visibility make them vulnerable to change. This mean, they will change in time. Will I be the same person if I change my name?

Also a key is usually used to be somewhat linked to other tables. A big PK is not great for that. But this is more of a practical issue.

I would recommend you to use a simple INT or BIGINT as the primary key and add a UNIQUE constraint to the path property. This way your model would be more flexible. If the media is moved to another path, you just will need to update a single value in the table; if the path were the PK then you would need to update all foreign keys related to it.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269623

Do not use a long character string as a primary key.

Use a synthetic primary key.

Here are some reasons:

  • One important purpose of primary keys is to support foreign keys. You don't want to have 4k strings lurking all over your database, when you could just have a 4 byte integer.
  • Another important reason for primary keys is to uniquely find each row. Most people I know don't want to have to type 4k characters to identify a row. I type fast and that would take time for me. And I'm sure I would make a typo somewhere along the way.
  • Two strings might only differ in, say, the 2017th character. I wouldn't want to have to figure out they are different, especially if the character is a 1 versus l or O versus 0.

Define an auto incrementing/identify/serial primary key. You can always declare the URL as unique so it is not duplicated (although some databases may not allow such a long key in an index).

Upvotes: 1

Related Questions