user498001
user498001

Reputation: 244

Fast ordered list matching algorithm in Java

I have a list of rules in the form

L1 -> (A, B, C)

L2 -> (D, E),

L3 -> (F, G, A),

L4 -> (C, A)

.....

This list contains ~30k such rules.

I have an input in the form (X, Y, Z)

This creates a method

List <Rule> matchRules(input)

Which belongs to a class RuleMatcher

I started with a very simple clear naive solution, in order to get the framework down, get something working.

public RuleMatcher(Collection<Rule> rules) {
   this.rules = rules;
}

public Collection<Rule> matchRules(List<Token> input) {
   List<Rule> matchingRules = new ArrayList<>();
   for(Rule r: this.rules) {
        if(r.matches(input)) {
            matchingRules.add(r);
        }
   }
   return matchingRules; 
}

Where matches is a very simple function that checks if the lengths are the same, and then checks each token as a for loop.

This matchRules function is called in the magnitude of billions of times.


Obviously this is a very poor implementation. According to my profiler at least half of the execution time is is spent in this matches function.

I was thinking of two possible solutions:

A. Some sort of Trie data structure holding the chains of rules which could be matched.

B. some sort of hash function. Each symbol is given a unique identifier. Unfortunately, there are about 8 thousand unique symbols so this might be difficult.

C. Make a hashmap conditioning on the size of the right hand side, the number of tokens in the rule. Unfortunately, the majority of the rules are about the same size, so this may not even be worthwhile.

D. Some awesome solution one of you come up with.

I hope somebody can shed some light on this problem.


Edit: A token is just an object with a unique number. For example "NN" is a token. Each instance of "NN" is exactly the same.

Matches code:

public boolean rhsMatches(List<Token> tokens) {
   if(tokens.size()!=rhsSize()) return false;
   for(int i = 0;i<rhsSize();i++) {
      if(!rightSide.get(i).equals(tokens.get(i)) {
        return false;
      }
   }
   return true;
}

Its not very pretty, but its simple.

Upvotes: 12

Views: 525

Answers (2)

janek
janek

Reputation: 184

To me it looks like a perfect scenario for engaging some worker threads. Tasks of matching seem independent of each other, divide the list of rules and delegate the matching to workers, if its possible in your situation.

Upvotes: 0

ElKamina
ElKamina

Reputation: 7807

Why not sort your rule list to begin with. Then you can binary search for the matching rule.

Upvotes: 1

Related Questions