user1853517
user1853517

Reputation:

C# Regex Optimization

I have a C# application, which I'm using RegEx to run an expect from a Unix response. I currently have this.

//will pick up :
//  What is your name?:
//  [root@localhost ~]#
//  [root@localhost ~]$
//  Do you want to continue [y/N]
//  Do you want to continue [Y/n]
const string Command_Prompt_Only = @"[$#]|\[.*@(.*?)\][$%#]";
const string Command_Question_Only = @".*\?:|.*\[y/N\]/g";
const string Command_Prompt_Question = Command_Question_Only + "|" + Command_Prompt_Only;

This works as I've tested it with www.regexpal.com, but I think I need some optimization as there are times, it seems to slow way down when I use Command_Prompt_Question.

var promptRegex = new Regex(Command_Prompt_Question);
var output = _shellStream.Expect(promptRegex, timeOut);

I might want to mention I'm using SSH.NET to talk to these Linux servers, but I don't think it's a SSH.NET issue because when I use Command_Prompt_Only it's fast.

Does anyone see any issues with the const string I'm using? Is there a better way to do it?

My project is open source if you feel like you want to go play with it.
https://github.com/gavin1970/Linux-Commander

Code in question: https://github.com/gavin1970/Linux-Commander/blob/master/Linux-Commander/common/Ssh.cs

It's call Linux Commander and I'm attempting to build a virtual linux console with Ansible support.

Upvotes: 0

Views: 86

Answers (2)

Dai
Dai

Reputation: 155055

Try this:

class Foo
{
    const string Command_Prompt_Only     = @"[$#]|\[.*@(.*?)\][$%#]";
    const string Command_Question_Only   = @".*\?:|.*\[y/N\]";

    const string Command_Prompt_Question = "(?:" + Command_Question_Only + ")|(?:" + Command_Prompt_Only + ")";

    private static readonly Regex _promptRegex = new Regex( Command_Prompt_Question, RegexOptions.Compiled );

    public void Foo()
    {
        // ...

        var output = _shellStream.Expect( _promptRegex, timeOut );
    }
}

Upvotes: -1

ΩmegaMan
ΩmegaMan

Reputation: 31596

Does anyone see any issues with the const string I'm using?

Yes too much backtracking in those patterns.

If one knows that there is at least one item, specifying a * (zero or more) can cause the parser to look over many zero type assertions. Its better to prefer the+ (one or more) multiplier which can shave a lot of time off of researching dead ends in backtracking.


This is interesting \[.*@(.*?)\] why not use the negative set ([^ ]) pattern instead such as this change:

\[[^@]+@[^\]+\]

Which says anchor off of a literal "[" and the find 1 or more items that are not a literal "@" ([^@]+) and then find 1 or more items that are not a literal "]" by [^\]+.

Upvotes: 1

Related Questions