john c. j.
john c. j.

Reputation: 1175

Regex to parse image link in Markdown

I'm trying to create regex to parse markdown links.

regex:

!\[[^\]]*\]\((.*)\s"(.*[^"])"?\s*\)

Test (link to live demo):

foo

![](image 2.png "hello world")

bar

Group 1 will be image 2.png, and group 2 will be hello world.

The problem appears when I try to parse a link without title:

foo

![](image 2.png)

bar

How I should modify regex to make it work in both cases?

Upvotes: 16

Views: 8960

Answers (3)

Divine Hycenth
Divine Hycenth

Reputation: 660

Here's a complete regexp to match both the Alt text and the image url in a markdown file with a named capture group:

(?<alt>!\[[^\]]*\])\((?<filename>.*?)(?=\"|\))\)

Upvotes: 2

Doug
Doug

Reputation: 11

The previously accepted answer only accounts for standard images, it's possible however that images could be used as links for hyperlinks, resulting in a nested image reference, such as:

![alt-text](http://example.com/image.png "image title")](http://example.com/some?target)

A more complete regex pattern would like like this:

\[?(!)(?'alt'\[[^\]\[]*\[?[^\]\[]*\]?[^\]\[]*)\]\((?'url'[^\s]+?)(?:\s+(["'])(?'title'.*?)\4)?\)

This pattern also provides named groups for all the potential other info you might want about the image, such as "alt text" or "title".

Upvotes: 1

Scott Weaver
Scott Weaver

Reputation: 7351

You have to make the second group optional since it's not always there. Also, you can achieve a little bit better readability with named groups, something like this perhaps:

!\[[^\]]*\]\((?<filename>.*?)(?=\"|\))(?<optionalpart>\".*\")?\)

https://regex101.com/r/cSbfvF/3/

Alternatively, your original regex fixed up would be:

!\[[^\]]*\]\((.*?)\s*("(?:.*[^"])")?\s*\)

https://regex101.com/r/u2DwY2/2/

Upvotes: 20

Related Questions