Kumba
Kumba

Reputation: 2428

What's a fast way to parse a math operator off of a string of numbers?

Given a string as a form of input (parsed from an input file) which represents a number and a mathematical operator (<, >, <=, >=, !, !=, and a few others), what is a really fast, efficient way to chop off that operator, compare it to a list of valid operators, and then set an "operator" variable to a state (i.e., Enum) representing the identified operator, then return just the number (as a string)?

I'm open to various ideas and implementations. I've tried several (about 6-7) myself, and find I'm not really satisfied with the speed. The fastest so far is a For Each loop that walks my list of "valid operators", and compares that operator's string representation against the chopped off bit from the numeric string. I determine the amount to chop off by the length of each valid operator in the valid list.

Here's a code example of the fastest implementation. Assume input like <378 and a valid ops list of <, >, !, or >=79 and a valid ops list of <=, >=:

Friend Function FindMatchingOp(ByVal Haystack As String,
                               ByVal ValidOps() As <OperatorType>) As String
    Dim tmpBit As String, tmpOpName As String, tmpOpLen As Int32

    For Each tmpOp As <OperatorType> In ValidOps
        tmpOpName = tmpOp.Name
        tmpOpLen = tmpOpName.Length
        tmpBit = Strings.Left(Haystack, tmpOpLen)

        If String.Equals(tmpBit, tmpOpName) Then
            <Code to set the correct operator>
            Return Haystack.Remove(0, tmpOpLen)
            Exit For
        End If
    Next

    Return vbNullString
End Function

Not all of the numeric strings I expect to parse will utilize the same math operators (hence the need for the ValidOps variable). Some might only support < and >, others might do <=, >=, and !. This is why I cannot hardcode the assumption that the operator will be only one character in length, and have to test for both one-or-two character operators. I believe it's these specific string checks that slow my other implementations down.

I've also tried putting ValidOps into things like a Dictionary, HashTable, ListDictionary, and even an Arraylist. The standard array beats all of them every time.

Thoughts?

PS, VB code only, please, in any advice or solutions.

EDIT: I am going to try and implement a Trie to handle this and see what its performance is. I got the idea from this StackOverflow question. Not going to work for me.

Upvotes: 1

Views: 645

Answers (3)

Martijn
Martijn

Reputation: 12102

What you want, in essence, is a parser for a number followed by an operator. You could do several things: 1. Write a good simple ad-hoc parser. 2. Write a simple inefficient ad-hoc parser (simpler and slower than 1) with for example regexes. 3. Have a parser written for you by a parser-generator (I googled http://www.codeproject.com/KB/recipes/TinyPG.aspx for you).

Your regular grammar here is

input := line*
line := number operator
number := digit+
digit := [1-9]
operator := [your|list|of|valid|operator|literals]

A simple solution would be to create a recursive decent parser. A fast solution would be to create a finite state automaton parser. A fast lazy (and thus the best) solution is have a parser-generator create a finite state automaton parser for you.

Upvotes: 0

Machinarius
Machinarius

Reputation: 3731

You should use the .NET Regex engine instead of brute-forcing comparisons against a list of operators with that foreach....

A generic way to extract the operator would be:

String Operator = Regex.Match(MathOperation as String, @"(?=\d*\s*)\w{1,2}(?=\s*)").Value;

Hope it helps :)

Upvotes: 1

Adam Robinson
Adam Robinson

Reputation: 185643

You could somewhat improve your function by changing:

tmpOpLen = tmpOpName.Length
tmpBit = Strings.Left(Haystack, tmpOpLen)

If String.Equals(tmpBit, tmpOpName) Then
    <Code to set the correct operator>
    Return Haystack.Remove(0, tmpOpLen)
    Exit For
End If

to...

If Haystack.StartsWith(tmpOp.Name) Then
    <Code to set the correct operator>
    Return Haystack.Remove(0, tmpOp.Name.Length)
    Exit For
End If

But that is probably going to be marginal. All you'll have is the removal of all of your intermediate strings.

Upvotes: 1

Related Questions