Ryan Guill
Ryan Guill

Reputation: 13886

Regex matching in ColdFusion OR condition

I am attempting to write a CF component that will parse wikiCreole text. I am having trouble getting the correct matches with some of my regular expression though. I feel like if I can just get my head around the first one the rest will just click. Here is an example:

The following is sample input:

You can make things **bold** or //italic// or **//both//** or //**both**//.

Character formatting extends across line breaks: **bold,
this is still bold. This line deliberately does not end in star-star.

Not bold. Character formatting does not cross paragraph boundaries.

My first attempt was:

<cfset out = REreplace(out, "\*\*(.*?)\*\*", "<strong>\1</strong>", "all") />

Then I realized that it would not match where the ** is not given, and it should end where there are two carriage returns.

So I tried this:

<cfset out = REreplace(out, "\*\*(.*?)[(\*\*)|(\r\n\r\n)]", "<strong>\1</strong>", "all") />

and it is close but for some reason it gives you this:

You can make things <strong>bold</strong>* or //italic// or <strong>//both//</strong>* or //<strong>both</strong>*//.

Character formatting extends across line breaks: <strong>bold,</strong>
this is still bold. This line deliberately does not end in star-star.

Not bold. Character formatting does not cross paragraph boundaries.

Any ideas?

PS: If anyone has any suggestions for better tags, or a better title for this post I am all ears.

Upvotes: 1

Views: 1183

Answers (5)

scott h
scott h

Reputation: 11

I know this is an older question but in response to where Ryan Guill said "I tried the $1 but it put a literal $1 in there instead of the match" for ColdFusion you should use \1 instead of $1

Upvotes: 1

Ryan McIlmoyl
Ryan McIlmoyl

Reputation: 221

I find this app immensely helpful when I'm doing anything with regex: http://www.gskinner.com/RegExr/desktop/ Still doesn't help with your actual issue, but could be useful going forward.

Upvotes: 0

Michael Carman
Michael Carman

Reputation: 30831

The [...] represents a character class, so this:

[(\*\*)|(\r\n\r\n)]

Is effectively the same as this:

[*|\r\n]

i.e. it matches a single "*" and the "|" isn't an alternation.

Another problem is that you replace the double linefeed. Even if your match succeeded you would end up merging paragraphs. You need to either restore it or not consume it in the first place. I'd use a positive lookahead to do the latter.

In Perl I'd write it this way:

$string =~ s/\*\*(.*?)(?:\*\*|(?=\n\n))/<strong>$1<\/strong>/sg;

Taking a wild guess, the ColdFusion probably looks like this:

REreplace(out, "\*\*(.*?)(?:\*\*|(?=\r\n\r\n))", "<strong>\1</strong>", "all")

Upvotes: 6

Goyuix
Goyuix

Reputation: 24330

You really should change your

(.*?) 

to something like

[^*]*?

to match any character except the *. I don't know if that is the problem, but it could be the any-character . is eating one of your stars. It also a generally accepted "best practice" when trying to balance matching characters like the double star or html start/end tags to explicitly exclude them from your match set for the inner text.

*Disclaimer, I didn't test this in ColdFusion for the nuances of the regex engine - but the idea should hold true.

Upvotes: 1

Kieveli
Kieveli

Reputation: 11075

I always use a regex web-page. It seems like I start from scratch every time I used regex.

Try using '$1' instead of \1 for this one - the replace is slightly different... but I think the pattern is what you need to get working.

Getting closer with this:

**(.?)**|//(.?)//

The tricky part is the //** or **//

Ok, first checking for //bold// then //bold// then bold, then //bold//

**//(.?)//**|//**(.?)**//|**(.?)**|//(.?)//

Upvotes: 0

Related Questions