Reputation: 2428
Given a string as a form of input (parsed from an input file) which represents a number and a mathematical operator (<, >, <=, >=, !, !=, and a few others), what is a really fast, efficient way to chop off that operator, compare it to a list of valid operators, and then set an "operator" variable to a state (i.e., Enum) representing the identified operator, then return just the number (as a string)?
I'm open to various ideas and implementations. I've tried several (about 6-7) myself, and find I'm not really satisfied with the speed. The fastest so far is a For Each loop that walks my list of "valid operators", and compares that operator's string representation against the chopped off bit from the numeric string. I determine the amount to chop off by the length of each valid operator in the valid list.
Here's a code example of the fastest implementation. Assume input like <378
and a valid ops list of <, >, !
, or >=79
and a valid ops list of <=, >=
:
Friend Function FindMatchingOp(ByVal Haystack As String,
ByVal ValidOps() As <OperatorType>) As String
Dim tmpBit As String, tmpOpName As String, tmpOpLen As Int32
For Each tmpOp As <OperatorType> In ValidOps
tmpOpName = tmpOp.Name
tmpOpLen = tmpOpName.Length
tmpBit = Strings.Left(Haystack, tmpOpLen)
If String.Equals(tmpBit, tmpOpName) Then
<Code to set the correct operator>
Return Haystack.Remove(0, tmpOpLen)
Exit For
End If
Next
Return vbNullString
End Function
Not all of the numeric strings I expect to parse will utilize the same math operators (hence the need for the ValidOps variable). Some might only support < and >, others might do <=, >=, and !. This is why I cannot hardcode the assumption that the operator will be only one character in length, and have to test for both one-or-two character operators. I believe it's these specific string checks that slow my other implementations down.
I've also tried putting ValidOps into things like a Dictionary, HashTable, ListDictionary, and even an Arraylist. The standard array beats all of them every time.
Thoughts?
PS, VB code only, please, in any advice or solutions.
EDIT:
I am going to try and implement a Trie to handle this and see what its performance is. I got the idea from this StackOverflow question. Not going to work for me.
Upvotes: 1
Views: 645
Reputation: 12102
What you want, in essence, is a parser for a number followed by an operator. You could do several things: 1. Write a good simple ad-hoc parser. 2. Write a simple inefficient ad-hoc parser (simpler and slower than 1) with for example regexes. 3. Have a parser written for you by a parser-generator (I googled http://www.codeproject.com/KB/recipes/TinyPG.aspx for you).
Your regular grammar here is
input := line*
line := number operator
number := digit+
digit := [1-9]
operator := [your|list|of|valid|operator|literals]
A simple solution would be to create a recursive decent parser. A fast solution would be to create a finite state automaton parser. A fast lazy (and thus the best) solution is have a parser-generator create a finite state automaton parser for you.
Upvotes: 0
Reputation: 3731
You should use the .NET Regex engine instead of brute-forcing comparisons against a list of operators with that foreach....
A generic way to extract the operator would be:
String Operator = Regex.Match(MathOperation as String, @"(?=\d*\s*)\w{1,2}(?=\s*)").Value;
Hope it helps :)
Upvotes: 1
Reputation: 185643
You could somewhat improve your function by changing:
tmpOpLen = tmpOpName.Length
tmpBit = Strings.Left(Haystack, tmpOpLen)
If String.Equals(tmpBit, tmpOpName) Then
<Code to set the correct operator>
Return Haystack.Remove(0, tmpOpLen)
Exit For
End If
to...
If Haystack.StartsWith(tmpOp.Name) Then
<Code to set the correct operator>
Return Haystack.Remove(0, tmpOp.Name.Length)
Exit For
End If
But that is probably going to be marginal. All you'll have is the removal of all of your intermediate strings.
Upvotes: 1