Homo Sapien
Homo Sapien

Reputation: 300

How to remove script tags from the markdown input by the user?

In my php application I have this textarea box that receives markdown from the users (like stackoverflow's) then it is displayed on the website. I am using the Laravel Framework and using parsedown-laravel package.
I can do:

{!! Markdown::parse('__Hello__ Markdown!'); !!}

It works.

{!! Markdown::parse('<h1>Hello</h1> Markdown!'); !!}

It still works. And I am ok with that.

Now if I do:

{!! Markdown::parse('<script>alert("XSS Attack!!!")</script> Markdown!'); !!}

It still works!!!

How can I prevent script tags in my app using Laravel and this package?

Upvotes: 5

Views: 2927

Answers (3)

spenibus
spenibus

Reputation: 4409

The original Parsedown library has an option to escape html:

echo Parsedown::instance()
    ->setMarkupEscaped(true) # escapes markup (HTML)
    ->text("<div><strong>*Some text*</strong></div>");

# Output:
# <p>&lt;div&gt;&lt;strong&gt;<em>Some text</em>&lt;/strong&gt;&lt;/div&gt;</p>

From Parsedown Tutorial: Get Started

Presumably, since parsedown-laravel is just a wrapper, you should be able to access that option.

Obviously this disables all tags rather than specific ones.

GitHub user moldcraft on issue 229 - Disable parsing of specific elements on the Parsedown bugtracker provides the following code, that could pave the way to a solution:

moldcraft commented on 24 Feb • 2015-02-24 18:41:31 +0100

May be useful for somebody: I also use Parsedown for user comments and I wanted to replace all h1, h2, h3 with h4 to prevent SEO warnings (for e.g. only one h1 must be on the page), here is my Symfony2 service

<?php

namespace App\MainBundle\Service;

use Parsedown;
use HTMLPurifier;
use Emojione\Emojione;
use Symfony\Component\DependencyInjection\ContainerInterface;

class Markdown extends Parsedown
{
    /**
     * @var HTMLPurifier
     */
    private $purifier;

    public function __construct(ContainerInterface $container)
    {
        $this->setMarkupEscaped(true);

        {
            $purifierConfig = array(
                'HTML.ForbiddenElements' => array('h1', 'h2', 'h3'),
                'HTML.ForbiddenAttributes' => array('style', 'onclick',),
                'HTML.TargetBlank' => true,
            );

            $this->purifier = new HTMLPurifier($purifierConfig);
        }

        {
            Emojione::$imageType = 'svg';
            Emojione::$sprites = true;
            Emojione::$imagePathSVGSprites = $container->get('templating.helper.assets')->getUrl(
                'bundles/appmain/emojione/sprites/emojione.sprites.svg'
            );
            Emojione::$ascii = true;
        }
    }

    function text($raw)
    {
        return Emojione::shortnameToImage(
            $this->purifier->purify(
                parent::text($raw)
            )
        );
    }

    private function safeHeader($Block)
    {
        if ($Block && isset($Block['element'])) {
            /**
             * Change h1, h2, h3 to h4
             */
            if (in_array($Block['element']['name'], array('h1', 'h2', 'h3'))) {
                $Block['element']['name'] = 'h4';
            }
        }

        return $Block;
    }

    protected function blockHeader($Line)
    {
        return $this->safeHeader(
            parent::blockHeader($Line)
        );
    }

    protected function blockSetextHeader($Line, array $Block = null)
    {
        return $this->safeHeader(
            parent::blockSetextHeader($Line, $Block)
        );
    }
}

Upvotes: 3

Frog
Frog

Reputation: 1641

If you take a look at the Markdown specification (either the original syntax by Jon Gruber or CommonMark), you'll find that Markdown is not supposed to replace HTML. Its only goal is to make it easier to read the text you write. Since Markdown only covers a small subset of HTML tags, you can still use HTML code inline to create exactly what you want. In fact, John Gruber says the following:

For any markup that is not covered by Markdown’s syntax, you simply use HTML itself. There’s no need to preface it or delimit it to indicate that you’re switching from Markdown to HTML; you just use the tags.

So basically, this is the way Markdown is supposed to work. Obviously this shouldn't be the case if you're parsing user's input. Because the Markdown parser outputs HTML code, you can't use the htmlentities function or a similar solution.

The easiest way to solve your problem is by using a HTML filtering library like HTML Purifier. This will strip malicious code from your Markdown output and will try to stop XSS attacks. Basically you should first call your Markdown parser and with that output call the HTML Purifier library.

Upvotes: 5

arkascha
arkascha

Reputation: 42915

Accepting user input and integrating it seemless with the applications code can never be secure. It is a no go.

If this is just about displaying the code, then you can do so using a <textinput> tag for example. You can style it such that it does not look like an input. Or you simply use a function like htmlescape() in combination with a <pre> tag.

Upvotes: -2

Related Questions