Rob
Rob

Reputation: 26324

More efficient way to determine if a string starts with a token from a set of tokens?

I'm currently doing something like this in some code I'm working on right now:

public CommandType GetCommandTypeFromCommandString(String command)
{
   if(command.StartsWith(CommandConstants.Acknowledge))
      return CommandType.Acknowledge;
   else if (command.StartsWith(CommandConstants.Status))
      return CommandType.Status;
   else if (command.StartsWith(CommandConstants.Echo))
      return CommandType.Echo;
   else if (command.StartsWith(CommandConstants.Warning))
     return CommandType.Warning;
     // and so on

   return CommandType.None;
}

I'd like to know if there is a more efficient way of doing this in C#. This code needs to execute many, many times a second, and I'm not too happy with the time it takes to do all of those string comparisons. Any suggestions? :)

Upvotes: 6

Views: 6308

Answers (11)

plinth
plinth

Reputation: 49179

Starting with a simple StartsWith function against an array as an extension method:

public static bool StartsWith(this string s, params string[] compareWithArray)
{
    return compareWithArray.Any(compareWith => s.StartsWith(compareWith));
}

Now this is just a predicate that returns whether or not a string in an array starts with a given string, but you could modify it somewhat:

public static int PrefixIndex(this string s, params string[] candidate)
{
    int index = -1;
    string match = candidate.FirstOrDefault(t => { index++; return s.StartsWith(t); });
    return match == default(string) ? -1 : index;
}

and in usage it would be:

int index = command.PrefixIndex(tokenStrings);

if (index >= 0) {
    // convert to your enum
}

In a comment I saw that you wanted to do 30 string compares in 1/40th of a second. My friend, you should be able to do that on a 1 MHz machine. It should be no sweat to do thousands of string compares in 1/40th of a second.

Upvotes: 1

BenAlabaster
BenAlabaster

Reputation: 39836

Edit: In light of a misunderstanding of one of the caveats of StopWatch, my original answer doesn't perform so well as StartsWith in combination with StringComparison.Ordinal. Even if you compile the regex with all the correct options it is a little slower, with performance comparable to using StartsWith without any StringComparison settings. However, the regular expression route does give you more flexibility to match patterns whereas StartsWith doesn't, so I've left my original answer for posterity...

Original Answer:

I have to admit, I'm not sure exactly what you're looking for - however, it seems to me that this kind of code is generally parsing a log file of some description looking to pull out useful information. To save you doing all the string comparisons long hand you could generate a regular expression of the values in your enumeration and then parse the correct enum item using the matched item:

public enum CommandType
{
    Acknowledge,
    Status,
    Echo,
    Warning
}
static public CommandType? GetCommandTypeFromString(String command)
{
    var CommandTypes = String.Join("|", Enum.GetNames(typeof(CommandType)));
    var PassedConstant = Regex.Match(command, String.Format("(?i:^({0}))", CommandTypes)).Value;
    if (PassedConstant != String.Empty)
        return (CommandType)Enum.Parse(typeof(CommandType), PassedConstant, true);
    return null;
}

static void Main(string[] args)
{
    Console.WriteLine(GetCommandTypeFromString("Acknowledge that I am great!").ToString());
}

This would pull out CommandType.Acknowledge from the beginning of my string, but only if it existed at the start of the string... it would also pull out the other correct CommandTypes.

Doing similar benchmarking to the accepted answer, I got about a 40% performance increase. I ran the accepted code over a million iterations in 10421ms, but mine ran in only 6459ms.

Of course, while the if statement you're using may not look as efficient as you'd like, it's still easier to read than using the regex form...

Upvotes: 2

Robert Rossney
Robert Rossney

Reputation: 96712

A trie's almost certainly going to be the fastest approach. If the prefixes were the same length, you might be able to come up with a faster implementation by hashing the prefix to get an index into an array, but that breaks down if you don't know how many bytes to hash.

Upvotes: 1

Perry Neal
Perry Neal

Reputation: 765

NOTE: I'm not demonstrating using Exception handling to control program flow. Enum.Parse will throw an exception if the string name of the Enum doesn't exist. The Catch clause just returns the default CommandType of None like the questioner's sample code does.

If the object is just to return the actual Enum object given the string name couldn't you use:

try
{
    return (CommandType)Enum.Parse(typeof(CommandType), strCmdName, true);
}
catch (Exception)
{
    return CommandType.None;
}

Upvotes: -3

Markus Olsson
Markus Olsson

Reputation: 22580

One optimization would be to use the StringComparison enum to specify that you only want ordinal comparison. Like this:

if(command.StartsWith(CommandConstants.Acknowledge, StringComparison.Ordinal))
    return CommandType.Acknowledge;

If you don't specify a string comparison method the current culture will be used for comparison and that slows things down a bit.

I did some (really really naive) benchmarking:

var a = "foo bar foo";
var b = "foo";

int numTimes = 1000000;

Benchmark.Time(() => a.StartsWith(b, StringComparison.Ordinal), "ordinal", numTimes);
Benchmark.Time(() => a.StartsWith(b), "culture sensitive", numTimes);

Which produced the following results:

ordinal ran 1000000 times in 35.6033 ms
culture sensitive ran 1000000 times in 175.5859 ms

You should also order your comparisons so that the most likely tokens are being compared first (happy path).

Theese optimizations are a simple way of making your current implementation perform better but if performance really is critical (and I mean really critical) you should be looking towards implementing some sort of state machine.

Upvotes: 12

armandino
armandino

Reputation: 18528

I think you should go for more readability than worry about efficiency. These operations are pretty fast. I second Serge, it's unlikely this part of the code is the bottleneck. I would do something like this:

public CommandType GetCommandTypeFromCommandString(String command)
{
   for(String currentCommand : allCommands) {
       if(command.StartsWith(currentCommand))
           return currentCommand;
   }
   return CommandType.None;
}

EDIT: As an afterthought, if you know which commands are used most frequently, you could order your array so those commands are at the start... you can also do this with if statements if you keep them.

Upvotes: 2

Jim Mischel
Jim Mischel

Reputation: 133995

I think you could do better with a regular expression and a Dictionary:

static Regex reCommands = new Regex("^(cmd1|cmd2|cmd3|cmd4)", RegexOptions.Compiled);
static Dictionary<string, CommandType> Commands = new Dictionary<string, CommandType>();
private static InitDictionary()
{
    Commands.Add("cmd1", cmdType1);
    Commands.Add("cmd2", cmdType2);
    Commands.Add("cmd3", cmdType3);
    Commands.Add("cmd4", cmdType4);
}
public CommandType GetCommandTypeFromCommandString(String command)
{
    Match m = reCommands.Match(command);
    if (m.Success)
    {
        return Commands[m.Groups[1].Value];
    }
    return CommandType.None; // no command
}

Upvotes: 3

Kothar
Kothar

Reputation: 6629

Similar in concept to the FSM answer by Vojislav, you could try putting the constants into a trie. You can then replace your sequence of comparisons with a single traversal of the trie.

There is a C# trie implementation described here.

Upvotes: 7

Serge Wautier
Serge Wautier

Reputation: 21878

How many times is "many many"? I seriously doubt this part of the code is the bottleneck. You could probably optimize it a tiny tiny little bit with a switch statement based on the first letter of each command, assuming they are all different.

But then again, it is really useful? I wouldn't bet much on it.

Upvotes: 1

Create an

IDictionary<string,CommandType> 

and populate it with the values.

You don't need to compare everything...just look it up in the table.

You'll also need to define your command syntax a little better. Require a space between the command and the rest of the line for example...

Upvotes: 1

Vojislav Stojkovic
Vojislav Stojkovic

Reputation: 8153

It's quite a bit of work, but you could construct an FSM based on each of the tokens. The FSM would accept characters from the command string one by one; it would have one final state for each token and an additional final state to which to go when the command doesn't start with any of the tokens.

Upvotes: 3

Related Questions