Reputation: 1175
I'm trying to create regex to parse markdown links.
regex:
!\[[^\]]*\]\((.*)\s"(.*[^"])"?\s*\)
Test (link to live demo):
foo
![](image 2.png "hello world")
bar
Group 1 will be image 2.png
, and group 2 will be hello world
.
The problem appears when I try to parse a link without title:
foo
![](image 2.png)
bar
How I should modify regex to make it work in both cases?
Upvotes: 16
Views: 8960
Reputation: 660
Here's a complete regexp to match both the Alt text and the image url in a markdown file with a named capture group:
(?<alt>!\[[^\]]*\])\((?<filename>.*?)(?=\"|\))\)
Upvotes: 2
Reputation: 11
The previously accepted answer only accounts for standard images, it's possible however that images could be used as links for hyperlinks, resulting in a nested image reference, such as:
![alt-text](http://example.com/image.png "image title")](http://example.com/some?target)
A more complete regex pattern would like like this:
\[?(!)(?'alt'\[[^\]\[]*\[?[^\]\[]*\]?[^\]\[]*)\]\((?'url'[^\s]+?)(?:\s+(["'])(?'title'.*?)\4)?\)
This pattern also provides named groups for all the potential other info you might want about the image, such as "alt text" or "title".
Upvotes: 1
Reputation: 7351
You have to make the second group optional since it's not always there. Also, you can achieve a little bit better readability with named groups, something like this perhaps:
!\[[^\]]*\]\((?<filename>.*?)(?=\"|\))(?<optionalpart>\".*\")?\)
https://regex101.com/r/cSbfvF/3/
Alternatively, your original regex fixed up would be:
!\[[^\]]*\]\((.*?)\s*("(?:.*[^"])")?\s*\)
https://regex101.com/r/u2DwY2/2/
Upvotes: 20