user2779312
user2779312

Reputation: 661

Random String generation - avoiding duplicates

I am using the below code to generate random keys like the following TX8L1I

public string GetNewId()
        {
            string result = string.Empty;

            Random rnd = new Random();
            short codeLength = 5;
            string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
            StringBuilder builder = new StringBuilder(codeLength);

            for (int i = 0; i < codeLength; ++i)
                builder.Append(chars[rnd.Next(chars.Length)]);

            result = string.Format("{0}{1}", "T", builder.ToString()); 

            return result;
        }

Every-time a key is generated a correspondent record is created on the database using the generated result as primary key. Off course this is not safe since a primary key violation might occur.

What's the correct approach to this? Should I load all existing keys and verify against tha t list if the key already exist and if yes generate another one?

Or maybe a more efficient way would be to move this logic into the database side? But even on the database side I still need to check that the random key generated does not exist on the current table.

Any ideas?

Thanks

Upvotes: 1

Views: 1749

Answers (6)

Jon Hanna
Jon Hanna

Reputation: 113392

well my decision to use the random is that I don't want the users to be able to know how the keys are generated (format), since the website is public and I don't want users to be trying to access keys that.

You've merged two problems into one that are really separate:

  1. Identify a row in a database.
  2. Have an identifier that is passed to and from clients, that matches this, but is not predictable.

This is a specific case of a more general set of two problems:

  1. Identify a row in a database.
  2. Have an identifier that is passed to and from clients, that matches this.

In this case, when we don't care about guessing, then the easiest way to deal with it is to just use the same identifier. E.g. for the database row identified by the integer 42 we use the string "42" and the mapping is a trivial int.Parse() or int.TryParse() in one direction and a trivial .ToString() or implicit .ToString() in the other.

And that's therefore the pattern we use without even thinking about it; public IDs and database keys are the same (perhaps with some conversion).

But the best solution to your specific case where you want to prevent guessing is not to change the key, but to change mapping between the key and the public identifier.

First, use auto-incremented integers ("IDENTITY" in SQL Server, and various similar concepts in other databases).

Then, when you are sending the key to the client (i.e. using it in a form value or appending it to a URI) then map it as so:

private const string Seed = "this is my secret seed ¾Ÿˇʣכ ↼⊜┲◗ blah balh";
private static string GetProtectedID(int id)
{
  using(var sha = System.Security.Cryptography.SHA1.Create())
  {
    return string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(id.ToString() + Seed)).Select(b => b.ToString("X2"))) + id.ToString();
  }
}

For example, with an ID of 123 this produces "989178D90470D8777F77C972AF46C4DED41EF0D9123".

Now map back to a the key with:

private static bool GetIDFromProtectedID(string str, out int id)
{
  int chkID;
  if(int.TryParse(str.Substring(40), out chkID))
  {
    using(var sha = System.Security.Cryptography.SHA1.Create())
    {
      if(string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(chkID.ToString() + Seed)).Select(b => b.ToString("X2"))) == str.Substring(0, 40))
      {
        id = chkID;
        return true;
      }
    }
  }
  id = 0;
  return false;
}

For "989178D90470D8777F77C972AF46C4DED41EF0D9123" this returns true and sets the id argument to 123. For "989178D90470D8777F77C972AF46C4DED41EF0D9122" (because I tried to guess the ID to attack your site) it returns false and sets id to 0. (The correct ID for key 122 is "F8AD0F55CA1B9426D18F684C4857E0C4D43984BA122", which is not easy to guess from having seen that for 123).

If you want, you can remove some of the 40 characters of the output to produce smaller IDs. This makes it less secure (an attacker has fewer possible IDs to brute-force) but should still be reasonable for many uses.

Obviously, you should used a different value for Seed than here, so that someone reading this answer can't use it to predict your ID; change the seed, change the ID. Seed cannot be changed once set without altering every ID in the system. Sometimes that's a good thing (if identifiers are never meant to have long-term value anyway), but normally it's bad (you've just 404d every page in the site that used it).

Upvotes: 2

Dmitrii Bychenko
Dmitrii Bychenko

Reputation: 186843

You've commited a typical sin with Random: one shouldn't create Random instance each time they want to generate random value (otherwise one'll get badly skewed distribution with many repeats). Put Random away form function:

// Let Generator be thread safe
private static ThreadLocal<Random> s_Generator = new ThreadLocal<Random>(
 () => new Random());

public static Random Generator {
  get {
    return s_Generator.Value;
  }
}

// For repetition test. One can remove repetion test if 
// number of generated ids << 15000
private HashSet<String> m_UsedIds = new HashSet<String>();

public string GetNewId() {
  int codeLength = 5;
  string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";

  while (true) {
    StringBuilder builder = new StringBuilder(codeLength);

    for (int i = 0; i < codeLength; ++i)
      builder.Append(chars[Generator.Next(chars.Length)]); // <- same generator for all calls

    result = string.Format("{0}{1}", "T", builder.ToString()); 

    // Test if random string in fact a repetition
    if (m_UsedIds.Contains(result))
      continue;

    m_UsedIds.Add(result);

    return result;
  }
}

For codeLength = 5 there're Math.Pow(chars.Length, codeLength) possible strings (Math.Pow(36, 5) == 60466176 = 6e7). According to birthday paradox you can expect 1st repetition at about 2 * sqrt(possible strings) == 2 * sqrt(6e7) == 15000. If it's OK, you can skip the repetition test, otherwise HashSet<String> could be a solution.

Upvotes: 1

Pierre
Pierre

Reputation: 9082

You can do the following:

Generate your string as well as concatenate the following onto it

string newUniqueString = string.Format("{0}{1}", result, DateTime.Now.ToString("yyyyMMddHHmmssfffffff"));

This way, you will never have the same key ever again!

Or use

var StringGuid = Guid.NewGuid();

Upvotes: 1

user3036342
user3036342

Reputation: 1045

I would move the logic to the database instead. Even though the database still have to check for the existence of the key, it's already within the database operation at that point, so there is no back-and-forth happening.

Also, if there's an index on this randomly generated key, a simple if exists will be quick enough to determine if the key has been used before or not.

Upvotes: 1

Soner G&#246;n&#252;l
Soner G&#246;n&#252;l

Reputation: 98868

I think you should use Random instance outside of your method.

Since Random object is seeded from the system clock, which means that if you call your method several times very quickly, it will use the same seed each time, which means that you'll get the same string at the end.

Upvotes: 3

Simon Whitehead
Simon Whitehead

Reputation: 65107

Move the decision to the database. Add a primary key of type uniqueidentifier. Then its a simple "fire and forget" algorithm.. the database decides what the key is.

Upvotes: 3

Related Questions