Reputation: 5290
I am new to regex and playing around with writing regex to match markdown syntaxes, particularly italic text like:
this is markdown with some *italic text*
After writing some naive implementations I found this regex which seems to do the job quite nicely (dealing with edge-cases) and matches the entire string:
(?<!\*)\*([^ ][^*\n]*?)\*(?!\*)
However, I don't want to match the entire string - I only want to match the beginning and end *
characters (so that I can do some special formatting to those characters). How might I go about doing that?
The tricky thing is that I only want to the match the *
characters when the rest of the string matches the correct format of a string in italics (i.e. meets the requirements of that regex above). So a simple regex like (\*|\*)
isn't going to cut it.
Upvotes: 2
Views: 93
Reputation: 163352
Except from using a capturing group for the asterix at the start and at the end, you can add an asterix to the first negated character class to prevent matching a double **
.
Note that as pointed out by @toto you don't really need the capturing groups around the asterix (\*)
. You can also match them and add the replacement characters before and after the single capturing group for the content in the middle.
It also means that it should match at least a single character other then an asterix.
You don't have to make the first character class non greedy *?
as it can not cross the *
boundary that follows.
(?<!\*)(\*)([^*\s][^*\r\n]*)(\*)(?!\*)
If there can also not be a space before the ending asterix, you can repeat matching a space followed by matching any non whitespace char except an asterix (?: [^*\s]+)*
The \r\n
in the negated character class is to prevent newline boundaries which are also matched by \s
. If that should not be the case, you can replace that by a space or tab and space.
(?<!\*)(\*)([^*\s]+(?: [^*\s]+)*)(\*)(?!\*)
Upvotes: 2