JL Peyret
JL Peyret

Reputation: 12174

Markdown parsing/pre-processors - support for enhancing/modifying arbitrary markdown files

I am trying to take .md files from the file system and prepare them for Vuepress-based hosting. To do that, I want to remain in markdown format, adjust some of the file contents, and save it to a differently-named .md. The original file should remain as is on the file system and should remain viewable on the workstation.

Vuepress's build system can take care of .md => .html transformations, this is a step I want to avoid.

I've looked at mistune and Python-markdown but both seem a lot more interested in rendering the Markdown to HTML, a step I want to leave entirely up to Vuepress.

Is there some mode in either to a) read markdown b) modify it via user plugins c) write it back as Markdown? What about non-Python utilities? I can handle JS or Ruby, though nowhere as well as Python.

For example:

Vuepress uses Frontmatter (YAML) to qualify what's in a document.

---
title: Blogging Like a Hacker
lang: en-US
---

I want to add them to the front of the file.

Image links need updating

Let's say I have an image in the same directory as the .md file. Markdown viewers can easily display using the markup below.

### My image:

![](./02.issue.png)

However, the following things need to happen for Vuepress to work:

### My image:

![](/<slug-based-name-for-md-file>/02.issue.png)

where slug-based-name-for-md-file is a unique name for the .md

and the file 02.issue.png needs to be copied to .vuepress/public/<slug-based-name-for-md-file>/02.issue.png.

So, what I need is a hook to process every image reference in the markdown document. I can write it easily, what I am looking for is a parser that tells me what images exist in a markdown file.

Yes, I know that finding images is only a few regexes away, but we do have those big powerful Markdown parsers so I wonder if I've missed something in their documentation. Plus, more nested Markdown structures might not be easy to classify via regex.

Upvotes: 1

Views: 820

Answers (1)

Waylan
Waylan

Reputation: 42517

seem a lot more interested in rendering the Markdown to HTML

This is correct. That is what a Markdown parser does; convert Markdown to HTML.

However, a subset of Markdown parsers are implemented with a two-step process wherein step one parses the Markdown to an Abstract Syntax Tree (AST) and step two renders that AST to HTML. Generally, the second step can be replaced with an alternate renderer which could output a different format. If a Markdown renderer exists, then you could output Markdown from the AST. Some implementations which do this are mistune (Python) and marked (JavaScript), among others. However, AFAIK, neither come with Markdown renderer so you would either need to find a third party renderer or build your own.

Assuming a third party Markdown renderer exists, you can then subclass it and override the relevant parts. For example. using mistune, you could customize a theoretical Markdown renderer to alter the image elements like this:

from somelib import MdRenderer

class CustomRenderer(MdRenderer):
    def image(self, src, alt="", title=None):
        src = get_link(src)
        return super().image(self, src, alt, title)

Note that the image src is modified by a function get_link. You will need to create that function or possibly make the modifications inline. You will also need to adjust the import statement according to the lib you find.

To use your custom renderer do this:

markdown = mistune.create_markdown(renderer=CustomRenderer())
output = markdown(input)

If you were to create your own Markdown renderer it might look something like this:

from mistune.renderers import BaseRenderer

class MdRenderer(BaseRenderer):
    NAME = 'md'

    # other elements defined here

    def image(self, src, alt="", title=""):
        src = get_link(src)
        if title:
            title = f' "{title}"'
        return f'![{alt}]({src}{title})'

    # other elements defined here

Of course, you will need to define methods for every type of element within a Markdown document.

Note that I used Python f strings in my example, which require a more recent version of Python. You may need to adjust if using an older version.

Upvotes: 1

Related Questions