rhetoric101
rhetoric101

Reputation: 19

Regex to find Markdown headings that don't end with anchors

I am trying to search through Markdown files in VS Code looking for headings that do not have anchors at the end their lines like this [#some-anchor-name]. To clarify, here is the shape of the headings I'm looking for:

Here are some regex I've tried:

This one almost works but it expects a literal space at the end of the heading with the missing anchor, which won't always be the case:

^#{1,4}.*\s(?!\[#.*\])$

The regex above matches on ## My Big Heading (note the space after the heading) which made me think I was going in the right direction.

I tried removing the search for the literal space just prior to the anchor and it matches on all my headings--even ones with anchors:

^#{1,4}.*(?!\[#.*\])$

For example, the regex above matches on ## My Big Heading and ## My Big Heading [#my-big-anchor]

To summarize, I'd like my regex to find line #2 below:

## My Big Heading [#my-big-anchor]
## My Big Heading

I looked at a variety of discussions on matching strings that don't have a particular pattern, but since I'm not matching a particular word at the end of the headings, they don't seem to apply:

Upvotes: 1

Views: 192

Answers (2)

The fourth bird
The fourth bird

Reputation: 163517

With your current pattern, the .*\s first matches until the end of the string, and then backtracks until the first occurrence of a whitespace char and then asserts that [#...] is not directly to the right.

While that assertion is true for the space in between Big Heading, the $ anchor right after it can not match.


You could write the pattern with the end of the string in the lookahead assertion:

^#{1,4}\s(?!.*\[#.*\]$).*

Explanation

  • ^ Start of string
  • #{1,4} Match 1-4 times a # char
  • \s Match a whitespace char
  • (?!.*\[#.*\]$) Negative lookahead, assert from the current position that the string does not end with [#...]
  • .* Match the rest of the line

Regex demo

Upvotes: 2

kDjakman
kDjakman

Reputation: 116

What you should avoid is using .* (zero or more sequences of any character except a new line) and use instead [^\[]* (zero or more sequences of any character except an opening square bracket).

This is because the .* pattern does not play nice with your negative look-ahead.

As long as your normal headings does not have an opening square bracket character, you can use the following simple regex: ^#{1,4}[^\[]+$. It does not use negative look-ahead assertion pattern.

Upvotes: 0

Related Questions