Skoota
Skoota

Reputation: 5290

Modifying regex to match beginning and end characters

I am new to regex and playing around with writing regex to match markdown syntaxes, particularly italic text like:

this is markdown with some *italic text*

After writing some naive implementations I found this regex which seems to do the job quite nicely (dealing with edge-cases) and matches the entire string:

(?<!\*)\*([^ ][^*\n]*?)\*(?!\*)

However, I don't want to match the entire string - I only want to match the beginning and end * characters (so that I can do some special formatting to those characters). How might I go about doing that?

The tricky thing is that I only want to the match the * characters when the rest of the string matches the correct format of a string in italics (i.e. meets the requirements of that regex above). So a simple regex like (\*|\*) isn't going to cut it.

Upvotes: 2

Views: 93

Answers (2)

The fourth bird
The fourth bird

Reputation: 163352

Except from using a capturing group for the asterix at the start and at the end, you can add an asterix to the first negated character class to prevent matching a double **.

Note that as pointed out by @toto you don't really need the capturing groups around the asterix (\*). You can also match them and add the replacement characters before and after the single capturing group for the content in the middle.

It also means that it should match at least a single character other then an asterix.

You don't have to make the first character class non greedy *? as it can not cross the * boundary that follows.

(?<!\*)(\*)([^*\s][^*\r\n]*)(\*)(?!\*)

Regex demo

If there can also not be a space before the ending asterix, you can repeat matching a space followed by matching any non whitespace char except an asterix (?: [^*\s]+)*

The \r\n in the negated character class is to prevent newline boundaries which are also matched by \s. If that should not be the case, you can replace that by a space or tab and space.

(?<!\*)(\*)([^*\s]+(?: [^*\s]+)*)(\*)(?!\*)

Regex demo

Upvotes: 2

dawg
dawg

Reputation: 103844

Just change the first and second \* to capturing groups and you can change at will:

(?<!\*)(\*)([^ ][^*\n]*?)(\*)(?!\*)

Demo

Upvotes: 1

Related Questions