funkyeah
funkyeah

Reputation: 3184

What library does Github use for parsing markdown?

Github "uses" github flavored markdown but I haven't been able to find what that means exactly. What parsing library do they use on the client to render the preview(s)?

Is the same lib used for *.md files, issues, and wiki pages?

Bonus points if you can point me to a resource that shows how github flavored markdown and commonmark overlap and how they are different.

Upvotes: 25

Views: 9094

Answers (5)

VonC
VonC

Reputation: 1326716

point me to a resource that shows how github flavored markdown and commonmark overlap and how they are different.

2025: as suggested in Schwern's answer, GitHub seems to be using gjtorikian/commonmarker, a Ruby wrapper for the kivikakk/comrak (CommonMark parser) Rust crate.
That crate is itself a port of github/cmark-gfm, with issue 371 illustrating that "it seems like GitHub has abandoned keeping up with the common mark spec.
Because of the extensions and add-ons GitHub doesn't even follow all of the GFM specs".

I'm specifically looking for info on how the header id attributes are generated, which is not covered at all in spec.

kivikakk/comrak issues 93 deals with anchor ID, pointing to src/html.rs#Anchorizer as the source for converting header strings to canonical, unique, but still human-readable, anchors.

The process seems to be:

  1. Lowercasing, where the header text is converted to lowercase.
  2. Character filtering, where only certain characters are allowed:
    • Spaces (which are later converted to dashes)
    • Dashes (-)
    • Letters
    • Unicode marks
    • Numbers
    • Connector punctuation (such as underscores)
  3. Space replacement, where each space is replaced with a dash (-).
  4. Uniqueness, to maintain a set of generated anchors. If the anchor already exists, a numeric suffix (e.g., -1, -2, etc.) is appended until a unique ID is produced.

2017:

This is now (March 2017) officially documented: see "A formal spec for GitHub Flavored Markdown"

Starting today, all Markdown user content hosted in our website, including user comments, wikis, and .md files in repositories will be parsed and rendered following a formal specification for GitHub Flavored Markdown.

This is detailed in "A formal spec for GitHub Flavored Markdown"

This formal specification is based on CommonMark, an ambitious project to formally specify the Markdown syntax used by many websites on the internet in a way that reflects its real world usage.
CommonMark allows people to continue using Markdown the same way they always have, while offering developers a comprehensive specification and reference implementations to interoperate and display Markdown in a consistent way between platforms.

The idea is:

Taking the CommonMark spec and re-engineering our current user content stack around it is not a trivial endeavour.
The main issue we struggled with is that the spec (and hence its reference implementations) focuses strictly on the common subset of Markdown that is supported by the original Perl implementation.
This does not include some of the extended features that have been always available on GitHub. Most notably, support for tables, strikethrough, autolinks and task lists are missing.

In order to fully specify the version of Markdown we use at GitHub (known as GFM), we had to formally define the syntax and semantics of these features, something which we had never done before. We did this on top of the existing CommonMark spec, taking special care to ensure that our extensions are a strict and optional superset of the original specification.

Upvotes: 12

Jo Liss
Jo Liss

Reputation: 33084

Documentation

There is a specification for GitHub Flavored Markdown, though it appears to be out of date – for example, it doesn't specify the syntax for footnotes (added in 2021, #64).

Therefore, I would suggest that GitHub's documentation is the best up-to-date source. Also take note of the many sections under "Work with advanced formatting" in the left sidebar.

Parsing library

[This section is out of date, as per Inigo's comment.]

The cmark-gfm library and tool, which is GitHub's fork of cmark, appears to be actively maintained and match the behavior of GitHub.com, so I assume that's what they're using as of 2023.

You may need to activate extensions using the -e option to get all GitHub features. To get a list of extensions, run make && ./build/src/cmark-gfm --list-extensions.

Upvotes: 0

Schwern
Schwern

Reputation: 165198

As of Dec 2024...

Github Markup "is the first step of a journey that every markup file in a repository goes on before it is rendered on GitHub.com: 1. github-markup selects an underlying library to convert the raw markup to HTML. See the list of supported markup formats below."

They list Commonmarker as their renderer for ".markdown, .mdown, .mkdn, .md"

It passes all of the CommonMark test suite, and is therefore spec-complete. It also includes extensions to the CommonMark spec as documented in the GitHub Flavored Markdown spec, such as support for tables, strikethroughs, and autolinking.

Previous answer from Sept 2016.

Markup is "The code we [Github] use to render README.your_favorite_markup". They list Redcarpet as their library for Markdown. This, in turn, uses Sundown. Whether this is used for all of the site I'm not sure.

It also claims to have "massive extension support".

Sundown has optional support for several (unofficial) Markdown extensions, such as non-strict emphasis, fenced code blocks, tables, autolinks, strikethrough and more.

For full detail you'll probably have to dig into those libraries.

Bonus points if you can point me to a resource that shows how github flavored markdown and commonmark overlap and how they are different.

Sundown claims to be "fully standards compliant" with Markdown v1.0.0 and v1.0.3, but for the life of me I cannot find those versions. Only v1.0.1 and CommonMark which is at 0.26.

Sundown passes out of the box the official Markdown v1.0.0 and v1.0.3 test suites, and has been extensively tested with additional corner cases to make sure its output is as sane as possible at all times.

The Github Markdown extensions are documented in their Mastering Markdown guide.

Upvotes: 16

starball
starball

Reputation: 51040

According to GitHub's project for selecting markup libraries by content type, it uses https://github.com/gjtorikian/commonmarker for Markdown, commonmarker describing itself as a "Ruby wrapper for the comrak (CommonMark parser) Rust crate".

GitHub flavoured Markdown is a superset of CommonMark.

Upvotes: 1

redreamality
redreamality

Reputation: 1164

Update on 2024

It's open source, too. Just take a look at here. https://github.com/github/markup

Upvotes: -1

Related Questions