user16100351
user16100351

Reputation: 301

How can I parse Markdown into an AST, manipulate it, and write it back to Markdown?

I want to modify Markdown files programmatically.

I have been looking into Markdown parsers and tried a few of them; namely Marked, Markdown-it and Commonmark. They give access to an AST, which allows me to modify the content easily.

The problem is that they render to HTML only. I couldn't find any info on rendering back to Markdown.

I see two options right now, either write a custom renderer for one of these libraries (which would be quite time consuming) or use a separate tool that transforms HTML back to Markdown.

Is there an easier alternative? And why would a Markdown parser only render to HTML?

Upvotes: 19

Views: 9216

Answers (3)

Inigo
Inigo

Reputation: 15030

The best alternative is what you wanted to do in the first place!

There are many Markdown parsers that produce ASTs, and a good number of those can render it back to Markdown!

And why would a Markdown parser only render to HTML?

The reason a lot of them do is because the number one use of Markdown is as source code for HTML. Markdown was even designed for that in the first place. So the most common use of a Markdown parser, including cases where people want to first manipulate the AST, is to output HTML.

That said, the really good libraries include the option to render to other formats, including back to Markdown.

Here are the libraries that I already know can do this:

Pandoc

Probably the number one Markdown toolkit in the world. Pandoc's native language is Haskell, but there are Javascript wrappers (just search npm). If you're going to do a lot of Markdown stuff down the road, it probably makes sense to become knowledgable in Pandoc anyway.

Its support for filters" is all about AST manipulation. It has special support for Lua and Lua filters, which might be the easiest to code, but you can also write filters in other languages: Python, PHP, Perl, Javascript/Typescript, Groovy, Ruby.

It supports renderer to Markdown, amongst a huge number of other formats.

Its parser and renderer has many other options that might make your job even easier, or maybe already do exactly what you want. There are also many filters people have written that may already do what you want.

CMark

Though this reference implementation of CommonMark is written in C, there are many Node wrappers. There is even a port to JavaScript using Emscripten. It ports the GitHub extensions, so that tables and other GFM things can also be manipulated in the AST.

It can output CommonMark, as well as HTML and LaTeX, or even an XML representation of the AST.

remark

A Javascript-based framework specifically designed around AST manipulation. I've never used it, but it has tools to make a variety of AST manipulation easier, and many plugins, e.g. to support GFM, MDX, front-matter, etc. See its README for more info on it and the entire remark/mdast/unified ecosystem.

See the answer that gives example usage: https://stackoverflow.com/a/78969216/8910547

Upvotes: 6

Casiano
Casiano

Reputation: 491

With remark

Here is an example of generating the Markdown Abstract Syntax Tree (MDAST) for a markdow file example.md using remark:

import fs from 'fs';
import { remark } from 'remark';
import remarkParse from 'remark-parse';

const filePath = process.argv[2] || './example.md';

const markdownContent = fs.readFileSync(filePath, 'utf-8');

const ast = remark()
    .use(remarkParse, {  
        gfm: true,           // Enable GitHub Flavored Markdown
    })
    .parse(markdownContent);

console.log(JSON.stringify(ast, null, 2));

With mdast-util-from-markdown

Here is an example of how to produce the MDAST using mdast-util-from-markdown for this input file frontmatterplusmath.md:

import fs from 'node:fs/promises'
import {frontmatter} from 'micromark-extension-frontmatter'
import {fromMarkdown} from 'mdast-util-from-markdown'
import {frontmatterFromMarkdown, frontmatterToMarkdown} from 'mdast-util-frontmatter'
import {toMarkdown} from 'mdast-util-to-markdown'
import {math} from 'micromark-extension-math'
import {mathFromMarkdown, mathToMarkdown} from 'mdast-util-math'

const doc = await fs.readFile('frontmatterplusmath.md')

const tree = fromMarkdown(doc, {
  extensions: [
    math(), 
    frontmatter(['yaml', 'toml']), 
  ],
  mdastExtensions: [
    frontmatterFromMarkdown(['yaml', 'toml']), 
    mathFromMarkdown()
  ]
})

console.log(JSON.stringify(tree, null, 2))

Upvotes: 0

Sebastian Landwehr
Sebastian Landwehr

Reputation: 399

I just found mdast-util-from-markdown which seems to do the trick. Then you can convert it back to a string with mdast-util-to-markdown. mdast is basically a markdown syntax tree specification.

Upvotes: 2

Related Questions