wouterds
wouterds

Reputation: 921

URL Shortener algorithm

I recently bought myself a domain for personal URL shortening. And I created a function to generate alphanumeric strings of 4 characters as reference.

BUT

How do I check if they are already used or not? I can't check for every URL if it exists in the database, or is this just the way it works and I have to do it? If so, what if I have 13.000.000 URLs generated (out of 14.776.336). Do I need to keep generating strings until I have found one that is not in the DB yet?

This just doesn't look the right way to do it, anyone who can give me some advise?

Upvotes: 1

Views: 1993

Answers (3)

maniek
maniek

Reputation: 7307

One algorithm is to try a few times to find a free url of N characters, if still not found, increase N. Start with N=4.

Upvotes: 0

MoveFast
MoveFast

Reputation: 3025

One memory efficient and faster way I think of is following. This problem can be solved without use of database at all. The idea is that instead of storing used urls in database, you can store them in memory. And since storing them in memory can take a lot of memory usage, so we will use a bit set (an array of bits ) and we only one bit for each url.

  1. For each random string you generate, create a hashcode for that that lies b/w 0 and max number K.
  2. Create a bit set( basically a bit array). Whenever you use some url, set corresponding hash code bit in bit set to 1.
  3. Whenever you generate a new url, see if its hashcode bit is set. If yes, then discard that url and generate a new one. Repeat the process till you get one unused one.

This way you avoid DB forever, your lookups are extremely fast and it takes least amount of memory.

I borrowed the idea from this place

Upvotes: 3

biziclop
biziclop

Reputation: 49754

A compromise solution is to generate a random id, and if it is already in the database, find the first empty id that is bigger than it. (Wrapping around if you can't find any empty space in the range above.)

If you don't want the ids to be unguessable (you probably don't if you only use 4 characters), this approach works fine and is quick.

Upvotes: 0

Related Questions