Mark Williams
Mark Williams

Reputation: 1310

Regular expression matching with spaces

I have some text that I want to match and replace in C#

The text will be something like this and can occur multiple times in a string

This is some content with a !!Some link text here this can be anything::/something/something/url.html!! inside it

I'm currently using this regex and replace but it's not working. It only seems to work if there are no spaces in the values.

Regex r = new Regex("!!(?<first>\\S+)::(?<last>\\S+)!!");

content = r.Replace(content, delegate(Match match) { return ReturnCustomSpan(match.Groups[1].Value, match.Groups[2].Value); });

Can anyone help please? I'm a regex noob and I can't figure this one out.

Upvotes: 2

Views: 253

Answers (3)

user557597
user557597

Reputation:

\S was your problem, but as Igor Korkhov mentioned, should you get
content that is out of sync with your delimeters there will be trouble.

There is no real protection criteria for this. By saying that the
delimeters are !! and :: you doom it to exist in the content
as only a delimeter and not a textual part of it.

If you say that it only exists as delimeters then you have to use the
non-greedy way as mentioned, otherwise you will have overruns.

If you say it could exist as text outside of delimeters, and the form
!! :: !! is perfect, then there is only one way to parse it out.

!!((?:(?!::|!!)[\s\S])*)::((?:(?!!!|::)[\s\S])*)!!
or
!!(?<first>(?:(?!::|!!)[\s\S])*)::(?<last>(?:(?!!!|::)[\s\S])*)!!

Upvotes: 0

Igor Korkhov
Igor Korkhov

Reputation: 8558

Try this:

!!(?<first>.+?)::(?<last>.+?)!!

It uses non-greedy quantifiers (.+?), so that regex will properly match string like this:

This is some content with a !!Some link text here this can be anything::/something/something/url.html!! :: inside it!!

Otherwise it will "eat" everything from the first occurence of !! to the last one, which is probably not what you expect.

Upvotes: 2

MNGwinn
MNGwinn

Reputation: 2394

\S is all non-whitespace characters, so you're explicitly excluding spaces. If you want to match any characters, use .+ instead of \S+

Upvotes: 4

Related Questions