How to check for key value metadata in markdown

Question

I need to check if my input, formatted using markdown, has key-value pair metadata at the beginning, and then insert text after the whole metadata block.

I look for a : in the first line and if found, split the input string at the first newline and add my stuff.

Now, if markdown_content.splitlines()[0].find(':') >= 0: obviously fails when there's no metadata at the beginning, but something else containing a :instead.

Examples

Input with metadata:

page title: fancypagetitle
something else: another value

# Heading

Text

Input without metadata, but with a :

This is a [link](http://www.stackoverflow.com)

# Heading

Text

My question is: How do I check if a metadata block is present and in case it is, add something in between metadata and the remaining markdown.

Definition of metadata

The keywords are case-insensitive and may consist of letters, numbers, underscores and dashes and must end with a colon. The values consist of anything following the colon on the line and may even be blank.

If a line is indented by 4 or more spaces, that line is assumed to be an additional line of the value for the previous keyword. A keyword may have as many lines as desired.

The first blank line ends all meta-data for the document. Therefore, the first line of a document must not be blank. All meta-data is stripped from the document prior to any further processing by Markdown.

Source: https://pythonhosted.org/Markdown/extensions/meta_data.html

Waylan · Accepted Answer

Have you considered looking at the source code for the meta data extension to see how it's done?

The regex used is:

META_RE = re.compile(r'^[ ]{0,3}(?P[A-Za-z0-9_-]+):\s*(?P.*)')

Of course there is also the regex for secondary lines:

META_MORE_RE = re.compile(r'^[ ]{4,}(?P.*)')

If you note, those regular expressions are much more specific than yours and are much less likely to match a false positive. Then the extension splits the document into lines, loops through each line comparing with those regexs and breaks out of the loop on the first line that does not match (which may or may not be blank line).

If you notice in that code, there is a new feature that has been added which will be available in the next release. Support is being added for optional YAML style deliminators. If you are comfortable using the latest (unreleased) development code, you could wrap your meta data in YAML deliminators which might make it a little easier to find the end of the meta data.

For example, your example document above would then look like this (note I used the optional end specific deliminator (...) which more clearly marks the end):

---
page title: fancypagetitle
something else: another value
...

# Heading

Text

That said, you would still need to be careful that you didn't get a false match (a

for example). I suppose either way you would really need to re-implement everything that is in the meta data extension for your own needs. Of course, it is open source, so you can as long as you honor the license.

Sorry, but I can't give you a timeline on when the next release will happen for sure.

Oh, and it may also help to look at the description of this feature provided by MultiMarkdown which inspired the feature in Python-Markdown. That might give you a clearer picture of what might comprise meta-data.

How to check for key value metadata in markdown

Examples

Definition of metadata

Answers (1)

Related Questions