weston
weston

Reputation: 54781

Regex extract optional group

I have some log strings in the format:

T01: Warning: Tag1: Message

T23: Tag2: Message2

I am trying to extract the T number, detect the presence of Warning:, then text of the Tag and Message all in one regex. The optional requirement of "Warning:" is tripping me up though.

    private const string RegexExpression = @"^T(?<Number>\d+): (?<Warning>Warning:)? (?<Tag>[^:]+): (?<Message>.*)";
    private const string Message = "blar blar blar: some messsage";

    //this test works
    [TestMethod]
    public void RegExMatchByTwoNamedGroupsWarningTest()
    {
        var rex = new Regex(RegexExpression);
        const string wholePacket = "T12: Warning: logtag: " + Message;
        var match = rex.Match(wholePacket);
        Assert.IsTrue(match.Groups["Warning"].Success); //warning is present
        Assert.IsTrue(match.Success);
        Assert.IsTrue(match.Groups["Number"].Success);
        Assert.AreEqual("12", match.Groups["Number"].Value);
        Assert.IsTrue(match.Groups["Tag"].Success);
        Assert.AreEqual("logtag", match.Groups["Tag"].Value);
        Assert.IsTrue(match.Groups["Message"].Success);
        Assert.AreEqual(Message, match.Groups["Message"].Value);
    }

    [TestMethod]
    public void RegExMatchByTwoNamedGroupsNoWarningTest()
    {
        var rex = new Regex(RegexExpression);
        const string wholePacket = "T12: logtag: " + Message;
        var match = rex.Match(wholePacket);
        Assert.IsFalse(match.Groups["Warning"].Success); //warning is missing
        Assert.IsTrue(match.Success); //fails
        Assert.IsTrue(match.Groups["Number"].Success); //fails
        Assert.AreEqual("12", match.Groups["Number"].Value);
        Assert.IsTrue(match.Groups["Tag"].Success); //fails
        Assert.AreEqual("logtag", match.Groups["Tag"].Value);
        Assert.IsTrue(match.Groups["Message"].Success); //fails
        Assert.AreEqual(Message, match.Groups["Message"].Value);
    }

Upvotes: 3

Views: 471

Answers (4)

Anirudha
Anirudha

Reputation: 32787

This Regex considers spaces and does its best!

@"^T(?'Number'\d+)\s*:\s*((?'Warning'\bWarning\b)\s*:)?\s*(?'Tag'.*?Tag.*?):\s*(?'Message'.*?)$"

Use this Regex with RegexOptions.IgnoreCase

Upvotes: 1

inhan
inhan

Reputation: 7470

@"^T(?<Number>\d+): ((?<Warning>Warning:.*) )?(?<Tag>[^:]+): (?<Message>.*)$";

I'm not sure about the end of line (Dollar) sign because I'm not familiar with c#, but...

Upvotes: 1

Chris
Chris

Reputation: 27599

Your problem is the whitespace in your regex.

If the warning group is not there then it is trying to match the space from before the optional warning pattern and the one from after. Clearly you only want to match one of them.

The solution is to have one of the spaces inside the optional pattern along with the warning. ie:

^T(?<Number>\d+): (?<Warning>Warning: )?(?<Tag>[^:]+): (?<Message>.*)

Upvotes: 1

ie.
ie.

Reputation: 6101

Try to set RegexOptions.IgnorePatternWhitespace:

var rex = new Regex(RegexExpression, RegexOptions.IgnorePatternWhitespace);

Or, update regex pattern:

private const string RegexExpression = @"^T(?<Number>\d+):\s*(?<Warning>Warning:)?\s*(?<Tag>[^:]+):\s*(?<Message>.*)";

Upvotes: 1

Related Questions