LStrike
LStrike

Reputation: 1648

Java: Generate unique ID from regular expression

I need some help.

Is there a common way to generate a unique id from a regular expression. I need to create an identifier which matches the following regex:

[A-N|P-Z|1-9]{10}

I have no idea where to start.

Regards LStrike

Upvotes: 0

Views: 2841

Answers (3)

tucuxi
tucuxi

Reputation: 17945

You have no guarantee of uniqueness-by-construction, because there is a limited number of valid IDs that satisfy that regex; so you should check that is is indeed unique before using. I assume that you want to generate non-sequential IDs (that is, AAAAAAAAAB following AAAAAAAAAA not desired).

Possible code:

string generateID(String valid, int length, Random r) {
    StringBuilder sb = new StringBuilder();
    while (sb.lengh() < length) {
        sb.append(valid.get(r.nextInt(valid.length()));
    }
    return sb.toString();
}

Converting the regex into a string with all valid characters (valid parameter above) requires parsing the regex; but assuming that it is of the form [list-of-chars]{number-of-chars}, as expected above, you can take the list of chars and see which are valid:

String generateFromRegex(string regex, Random r) {
   String charsRegex = regex.replaceAll("[{].*", ""); // strip off repetition count
   StringBuilder valid = new StringBuilder();
   final Charset charset = Charset.forName("US-ASCII"); // assume us-ascii
   for (int i = 0; i < 255; i++) {
     ByteBuffer bb = ByteBuffer.allocate(4);
     bb.putInt(i);
     String charString = new String(bb.array(), charset).trim();
     if (charString.length() == 1 && charString.matches(charsRegex)) {
        valid.append(charString);
     }
   }
   int length = Integer.parseInt(
                  regex.replaceAll(".*[{]", "").replaceAll("}", ""));
   return generateID(valid, length, r);
}

Note that the Random instance is supplied externally, because you want to use the same instance for all calls. If you use a new Random() for each call, it is overwhelmingly likely that you will generate sequences of identical "unique" IDs if you make several successive calls.

Upvotes: 1

Chris K
Chris K

Reputation: 11927

To generate a string that would match a specific regexp, from the definition of a regexp. I would parse the regexp into its automata (a graph). Then walk the automata, similar to how regexp matchers work but instead of matching, have it write the edges that it traverses.

Take a look at http://hackingoff.com/compilers/regular-expression-to-nfa-dfa, and give it your regexp. It will then draw the graph that I am referring to.

Having a hunt around the internet for you, I found an open source java library that can generate automata from a regexp. So you may be able to use this to get you started: http://www.brics.dk/automaton/

It looks like http://code.google.com/p/xeger will do this for you.

Upvotes: 1

jamp
jamp

Reputation: 2275

If you don't need to dynamically change the regex and you don't need randomness, I would just create a method that dispatches IDs starting from 1111111111 to ZZZZZZZZZZ.

Upvotes: 1

Related Questions