Alexei
Alexei

Reputation: 493

MySQL key dilemma

I have a table that caches data (shared hosting so no memcached) to a MySQL table.

The concept is this:

I have a page that loads (static) data and then cache then:

The cache table is like this:

=cache=
url varchar(255) - primary key
page mediumtext

Now I think I'm doing the right thing, based on what I have (shared host, no caching like memcached, etc.) but my question is this:

Because the URL is a varchar index but because numeric IDs (like int) are faster, is there a way to convert a URL like /contact-us/ or /product-category/product-name/ to a unique integer? Or is there any other way to optimize this?

Upvotes: 2

Views: 103

Answers (3)

Vincent Vanmarsenille
Vincent Vanmarsenille

Reputation: 504

select all cached urls with a hash, then search for exact url in all hash colisions

select page from (select * from cache where HASHEDURL = STOREDHASH) where url = 'someurl'

Upvotes: 0

iblue
iblue

Reputation: 30434

Changing your url column to a fixed-size string would make indexing slightly faster, if there wasn't another dynamically-sized column (TEXT) in the table. Converting it to an integer would be possible, depending on your URL structure - you could also use some kind of hash function. But why don't you make your life easier?

You could save your cache results directly to the disk and create a mod_rewrite filter (put it tou your .htaccess file), that matches if the file exists, otherwise invokes the PHP script. This would have 2 advantages:

  1. If the cache is hot, PHP will not run. This saves time and memory.
  2. If the file is requested often and it is small enough (or you have lots of RAM), it will be held in the RAM. This is much faster than MySQL.

Upvotes: 0

I would create some form of hash which would allow a shorter key. In many cases something simple like a hash of the request path may be viable. Alternatively something even simpler like CRC32('/your/path/here') may be suitable in your situation as a primary key. In this example the following columns would exist

 urlCRC INT(11) UNSIGNED NOT NULL (PRIMARY KEY)
 url VARCHAR(255) NOT NULL
 page MEDIUMTEXT

You could then take this a step further, and add trigger BEFORE INSERT which would calculate the value for urlCRC, i.e. containing

NEW.urlCRC = CRC32(NEW.url)

You could then create a stored procedure which takes as argument inURL (string), and internally it would do

SELECT * FROM cacheTable WHERE urlCRC = CRC32(inURL);

If the number of rows returned is 0, then you can trigger logic to cache it.

This may of course be overkill, but would provide you a numeric key to work on which, assuming there are no conflicts, would suffice. By storing the url as VARCHAR(255) also then if a conflict does occur, you can easily regenerate new hashes using a different algorithm.

Just to be clear, I just use CRC32() as an example off the top of my head, chances are there are more suitable algorithms. The main point to take away is that a numeric key is more efficient to search so if you can convert your strings into unique numerics it would be more efficient when retrieving data.

Upvotes: 2

Related Questions