Building a Markdown parser. Is it possible to detect links without detecting underscores in them?

Question

I'm trying to write a basic Markdown parser, and I want to build a regular expression that can detect links and emphasis.

In Markdown links look like [text](URL) and emphasis/italics look like *text* or _text_.

I have no problem detecting emphasis, nor do I have issue detecting links, but when links have underscores in them, such as http://example.com/link_to_article, my parser detects _to_ as an attempt at emphasis.

How do I stop this?

My first attempt was to make sure there were no characters before the first underscore or after the second, but inline emphasis is totally valid, as seen here on Stackoverflow so examples like intere_stin_g are totally valid, shooting that idea in the foot.

So how would I accomplish this?

Building a Markdown parser. Is it possible to detect links without detecting underscores in them?

Answers (1)

Related Questions