Reputation: 1264
I'm sure someone already asked this question but I don't know what words to search for in google to find these answers.
I have to "translate" a text with markup to html (or rtf or xaml). The markup for "bold" is *. If I'd like the bold text to contain a literal * I have to mask it with a back slash.
So, the marked-up text...
This is *ju\*st* a test.
...should translate to "This is ju*st a test."
I'm looking for a regex pattern to get all the matches to "translate" to bold inside my marked-up text.
Right now I'm stuck with this one (a literal star followed by one or more characters that are not a star (as few as possible), followed by a literal star)
\*[^*]+?\*
But how can I enhance the "one or more characters that are not a star" part to don't stop at stars that are preceded with a backslash?
I want to use this regex in a .NET project, in case there are differences between the languages.
Upvotes: 4
Views: 1219
Reputation: 627082
You may use
(?<=(?<!\\)(?:\\{2})*)\*[^\\*]*(?:\\.[^\\*]*)*\*
See the .NET regex demo.
Details
(?<=(?<!\\)(?:\\{2})*)
- a positive lookbehind that makes sure there is no \
escape char right before the current location. In other words, it matches a location that is immediately preceded with:
(?<!\\)
- no \
char followed with (?:\\{2})*
- any zero or more repetitions of double backslashes\*
- a *
char[^\\*]*
- zero or more chars other than \
and *
(?:
- start of a non-capturing group matching...
\\.
- any char (other than a newline, compile the pattern with RegexOptions.Singleline
to allow any escaped char) escaped with a \
char[^\\*]*
- zero or more chars other than \
and *
)*
- zero or more times\*
- a *
char.Upvotes: 1
Reputation: 16730
You want to match from a markup star to another markup star.
In your markup language, a literal star is actually not only *
, but \*
.
In regex, this translates by \\\*
: a backslash, that must be escaped, then a star, that must be escaped too.
Therefore, you need to specify in your pattern that you're looking for a markup star, as opposed to a literal star.
\*.*[^\\]\*
\* a markup star
.* followed by any character
[^\\]\* then a markup star, that is, one not escaped by a backslash
This is a little off though, because .*
is greedy, so in "*ju\*st* *ju\*st*
, it's gonna match the whole string, from the first to the last stars.
You can use the lazy/non-greedy version of the star modifier: *?
in most engines.
So it becomes:
\*.*?[^\\]\*
\* a markup star
.*? followed by any character, but as few as possible
[^\\]\* then a markup star, that is, one not escaped by a backslash
Small try with Python:
>>> s = r"*ju\*st* *ju\*st*"
>>> re.match(r"\*.*[^\\]\*", s)
<re.Match object; span=(0, 17), match='*ju\\*st* *ju\\*st*'>
>>> re.match(r"\*.*?[^\\]\*", s)
<re.Match object; span=(0, 8), match='*ju\\*st*'>
If your regex engine does not support lazy modifiers, you'll need to explicit this behaviour:
\*([^*]|\\\*)*[^\\]\*
\* a markup star
( then either...
[^*] ...any character but a star...
| ...or...
\\\* ...a star prefix by a backslash, ie a literal star
)* any number
[^\\]\* then a markup star
Upvotes: 1