darkhorse
darkhorse

Reputation: 8782

Custom syntax highlighting with Monaco and Monarch for Markdown

I want to do some custom highlighting using Monaco (and Monarch) for the Markdown language. Let's say I want to implement the following custom syntax highlighting rules:

How can I implement these rules using Monarch? Can I set them to a custom color, maybe even a CSS variable? The documentation page does not have examples for this unless I'm missing something.

My Markdown looks like the following:

Hello **Markdown** and **MyKeyword2**!

|| This is special

And the Monarch settings look like these (copied from https://microsoft.github.io/monaco-editor/monarch.html):

// Difficulty: "Ultra-Violence"
// Language definition for Markdown
// Quite complex definition mostly due to almost full inclusion
// of the HTML mode (so we can properly match nested HTML tag definitions)
return {
    defaultToken: '',
    tokenPostfix: '.md',

    // escape codes
    control: /[\\`*_\[\]{}()#+\-\.!]/,
    noncontrol: /[^\\`*_\[\]{}()#+\-\.!]/,
    escapes: /\\(?:@control)/,

    // escape codes for javascript/CSS strings
    jsescapes: /\\(?:[btnfr\\"']|[0-7][0-7]?|[0-3][0-7]{2})/,

    // non matched elements
    empty: [
        'area', 'base', 'basefont', 'br', 'col', 'frame',
        'hr', 'img', 'input', 'isindex', 'link', 'meta', 'param'
    ],

    tokenizer: {
        root: [

            // headers (with #)
            [/^(\s{0,3})(#+)((?:[^\\#]|@escapes)+)((?:#+)?)/, ['white', 'keyword', 'keyword', 'keyword']],

            // headers (with =)
            [/^\s*(=+|\-+)\s*$/, 'keyword'],

            // headers (with ***)
            [/^\s*((\*[ ]?)+)\s*$/, 'meta.separator'],

            // quote
            [/^\s*>+/, 'comment'],

            // list (starting with * or number)
            [/^\s*([\*\-+:]|\d+\.)\s/, 'keyword'],

            // code block (4 spaces indent)
            [/^(\t|[ ]{4})[^ ].*$/, 'string'],

            // code block (3 tilde)
            [/^\s*~~~\s*((?:\w|[\/\-#])+)?\s*$/, { token: 'string', next: '@codeblock' }],

            // github style code blocks (with backticks and language)
            [/^\s*```\s*((?:\w|[\/\-#])+)\s*$/, { token: 'string', next: '@codeblockgh', nextEmbedded: '$1' }],

            // github style code blocks (with backticks but no language)
            [/^\s*```\s*$/, { token: 'string', next: '@codeblock' }],

            // markup within lines
            { include: '@linecontent' },
        ],

        codeblock: [
            [/^\s*~~~\s*$/, { token: 'string', next: '@pop' }],
            [/^\s*```\s*$/, { token: 'string', next: '@pop' }],
            [/.*$/, 'variable.source'],
        ],

        // github style code blocks
        codeblockgh: [
            [/```\s*$/, { token: 'variable.source', next: '@pop', nextEmbedded: '@pop' }],
            [/[^`]+/, 'variable.source'],
        ],

        linecontent: [

            // escapes
            [/&\w+;/, 'string.escape'],
            [/@escapes/, 'escape'],

            // various markup
            [/\b__([^\\_]|@escapes|_(?!_))+__\b/, 'strong'],
            [/\*\*([^\\*]|@escapes|\*(?!\*))+\*\*/, 'strong'],
            [/\b_[^_]+_\b/, 'emphasis'],
            [/\*([^\\*]|@escapes)+\*/, 'emphasis'],
            [/`([^\\`]|@escapes)+`/, 'variable'],

            // links
            [/\{+[^}]+\}+/, 'string.target'],
            [/(!?\[)((?:[^\]\\]|@escapes)*)(\]\([^\)]+\))/, ['string.link', '', 'string.link']],
            [/(!?\[)((?:[^\]\\]|@escapes)*)(\])/, 'string.link'],

            // or html
            { include: 'html' },
        ],

        // Note: it is tempting to rather switch to the real HTML mode instead of building our own here
        // but currently there is a limitation in Monarch that prevents us from doing it: The opening
        // '<' would start the HTML mode, however there is no way to jump 1 character back to let the
        // HTML mode also tokenize the opening angle bracket. Thus, even though we could jump to HTML,
        // we cannot correctly tokenize it in that mode yet.
        html: [
            // html tags
            [/<(\w+)\/>/, 'tag'],
            [/<(\w+)/, {
                cases: {
                    '@empty': { token: 'tag', next: '@tag.$1' },
                    '@default': { token: 'tag', next: '@tag.$1' }
                }
            }],
            [/<\/(\w+)\s*>/, { token: 'tag' }],

            [/<!--/, 'comment', '@comment']
        ],

        comment: [
            [/[^<\-]+/, 'comment.content'],
            [/-->/, 'comment', '@pop'],
            [/<!--/, 'comment.content.invalid'],
            [/[<\-]/, 'comment.content']
        ],

        // Almost full HTML tag matching, complete with embedded scripts & styles
        tag: [
            [/[ \t\r\n]+/, 'white'],
            [/(type)(\s*=\s*)(")([^"]+)(")/, ['attribute.name.html', 'delimiter.html', 'string.html',
                { token: 'string.html', switchTo: '@tag.$S2.$4' },
                'string.html']],
            [/(type)(\s*=\s*)(')([^']+)(')/, ['attribute.name.html', 'delimiter.html', 'string.html',
                { token: 'string.html', switchTo: '@tag.$S2.$4' },
                'string.html']],
            [/(\w+)(\s*=\s*)("[^"]*"|'[^']*')/, ['attribute.name.html', 'delimiter.html', 'string.html']],
            [/\w+/, 'attribute.name.html'],
            [/\/>/, 'tag', '@pop'],
            [/>/, {
                cases: {
                    '$S2==style': { token: 'tag', switchTo: 'embeddedStyle', nextEmbedded: 'text/css' },
                    '$S2==script': {
                        cases: {
                            '$S3': { token: 'tag', switchTo: 'embeddedScript', nextEmbedded: '$S3' },
                            '@default': { token: 'tag', switchTo: 'embeddedScript', nextEmbedded: 'text/javascript' }
                        }
                    },
                    '@default': { token: 'tag', next: '@pop' }
                }
            }],
        ],

        embeddedStyle: [
            [/[^<]+/, ''],
            [/<\/style\s*>/, { token: '@rematch', next: '@pop', nextEmbedded: '@pop' }],
            [/</, '']
        ],

        embeddedScript: [
            [/[^<]+/, ''],
            [/<\/script\s*>/, { token: '@rematch', next: '@pop', nextEmbedded: '@pop' }],
            [/</, '']
        ],
    }
};

Upvotes: 0

Views: 1075

Answers (1)

Me88_88
Me88_88

Reputation: 21

You can add custom rules to the Monarch tokenizer to highlight lines that start with || and the keywords MyKeyword1 and MyKeyword2. You can try this:

return {
    defaultToken: '',
    tokenPostfix: '.md',

    // Define your custom keywords
    keywords: ['MyKeyword1', 'MyKeyword2'],

    // escape codes
    control: /[\\`*_\[\]{}()#+\-\.!]/,
    noncontrol: /[^\\`*_\[\]{}()#+\-\.!]/,
    escapes: /\\(?:@control)/,

    // escape codes for javascript/CSS strings
    jsescapes: /\\(?:[btnfr\\"']|[0-7][0-7]?|[0-3][0-7]{2})/,

    // non matched elements
    empty: [
        'area', 'base', 'basefont', 'br', 'col', 'frame',
        'hr', 'img', 'input', 'isindex', 'link', 'meta', 'param'
    ],

    tokenizer: {
        root: [
            // Add your custom rules at the top of the root array
            [/^\|\|.*$/, 'custom.pipe'],
            [/\b(?:MyKeyword1|MyKeyword2)\b/, 'custom.keyword'],

            // existing rules...
        ],

        // existing states...
    }
};

And to set color for these tokens, you can use the monaco.editor.defineThemefunction to define a custom theme. You might try this:

monaco.editor.defineTheme('myTheme', {
    base: 'vs',
    inherit: true,
    rules: [
        { token: 'custom.pipe', foreground: 'FF0000' }, // Change color for ||
        { token: 'custom.keyword', foreground: '00FF00' } // Change color for MyKeyword1 and MyKeyword2
    ]
});

monaco.editor.create(document.getElementById('container'), {
    value: 'Hello **Markdown** and **MyKeyword2**!\n\n|| This is special',
    language: 'markdown',
    theme: 'myTheme'
});

Upvotes: 0

Related Questions