Reputation: 1605
I have the following sentence
#bb John can #20 jiang stone [voila]
I want my C# regex to give me 5 matches to my groups
#bb
John Can
20
jiang stone
voila
Of which the tokens in #bb and voila positions are optional.
I used the following regex expression which works nicely in a sentence that doesn't have the first #bb - for e.g.
John can #20 jiang stone [voila]
gives me 4 correct tokens with the expression
@"(.*)#(\d+)(.*\s)(?:\[(.*)\])?"
Yet when I extend this with
@"(?:#[a-zA-Z])?(.*)#(\d+)(.*\s)(?:\[(.*)\])?"
It doesn't work. The #bb in the beginning of the sentence isn't matched as a separate token - instead I get a match as
b John Can
I've tried several variations but none give me an optional match to the first #.. match. What I want is that this can be #{1 or 2 characters} and this can be optional. I can have it, or it might be missing, in which case the rest should return the tokens.
What's wrong with my regex?
Thanks for your help
Upvotes: 1
Views: 1186
Reputation: 183436
This:
#[a-zA-Z]
means a #
followed by a single ASCII letter. You want this:
#[a-zA-Z]{1,2}
in order to allow one or two ASCII letters.
In addition, this:
(?:...)
means a non-capturing group. If you want a token to show up in your results, you need to wrap it in capturing parentheses:
(...)
So, putting it together:
@"((?:#[a-zA-Z]{1,2})?)(.*)#(\d+)(.*\s)(?:\[(.*)\])?"
(Note: it's not obvious to me how you want the whitespace to be handled; you may need to tweak the above a bit for your needs. Note, in particular, that if there's whitespace between the first two tokens, the above pattern will treat it as being part of the second token.)
Upvotes: 4