jeremy
jeremy

Reputation: 1605

Optional regex match for a group doesn't work

I have the following sentence

#bb John can #20 jiang stone [voila]

I want my C# regex to give me 5 matches to my groups

#bb
John Can
20
jiang stone
voila

Of which the tokens in #bb and voila positions are optional.

I used the following regex expression which works nicely in a sentence that doesn't have the first #bb - for e.g.

John can #20 jiang stone [voila]

gives me 4 correct tokens with the expression

@"(.*)#(\d+)(.*\s)(?:\[(.*)\])?"

Yet when I extend this with

@"(?:#[a-zA-Z])?(.*)#(\d+)(.*\s)(?:\[(.*)\])?"

It doesn't work. The #bb in the beginning of the sentence isn't matched as a separate token - instead I get a match as

b John Can

I've tried several variations but none give me an optional match to the first #.. match. What I want is that this can be #{1 or 2 characters} and this can be optional. I can have it, or it might be missing, in which case the rest should return the tokens.

What's wrong with my regex?

Thanks for your help

Upvotes: 1

Views: 1186

Answers (1)

ruakh
ruakh

Reputation: 183436

This:

#[a-zA-Z]

means a # followed by a single ASCII letter. You want this:

#[a-zA-Z]{1,2}

in order to allow one or two ASCII letters.

In addition, this:

(?:...)

means a non-capturing group. If you want a token to show up in your results, you need to wrap it in capturing parentheses:

(...)

So, putting it together:

@"((?:#[a-zA-Z]{1,2})?)(.*)#(\d+)(.*\s)(?:\[(.*)\])?"

(Note: it's not obvious to me how you want the whitespace to be handled; you may need to tweak the above a bit for your needs. Note, in particular, that if there's whitespace between the first two tokens, the above pattern will treat it as being part of the second token.)

Upvotes: 4

Related Questions