pppp
pppp

Reputation: 231

shortcut to escaping to prevent XSS

I've just discovered that my website (html/php) is vulnerable to XSS attacks.
Is there any way to sanitize my data besides manually adding htmlspecialchars to each individual variable that I send to the webpage (and proably missing a few thereby leaving it still open to attack)?

Upvotes: 0

Views: 240

Answers (1)

deceze
deceze

Reputation: 522076

No, there is no shortcut. Data escaping always needs to happen on a case by case basis; not only with regards to HTML, but to any other textual format as well (SQL, JSON, CSV, whathaveyou). The "trick" is use tools which do not require you to think about this much and hence may allow you to "miss" something. If you're just echoing strings into other strings, you're working at the bare metal level and you do need a lot of conscious effort to escape everything. The generally accepted alternative is to use a templating language which implicitly escapes everything.

For example, Twig:

The PHP language is verbose and becomes ridiculously verbose when it comes to output escaping:

<?php echo $var ?>
<?php echo htmlspecialchars($var, ENT_QUOTES, 'UTF-8') ?>

In comparison, Twig has a very concise syntax, which make templates more readable:

{{ var }}
{{ var|escape }}
{{ var|e }}         {# shortcut to escape a variable #}

To be on the safe side, you can enable automatic output escaping globally or for a block of code:

{% autoescape true %}
    {{ var }}
    {{ var|raw }}     {# var won't be escaped #}
    {{ var|escape }}  {# var won't be doubled-escaped #}
{% endautoescape %}

This still lets you shoot yourself in the foot, but is a lot better.

One step up still is PHPTAL:

<div class="item" tal:repeat="value values">
  <div class="title">
    <span tal:condition="value/hasDate" tal:replace="value/getDate"/>
    <a tal:attributes="href value/getUrl" tal:content="value/getTitle"/>
  </div>
  <div id="content" tal:content="value/getContent"/>
</div>

It requires you to write valid HTML simply to compile the template, and the template engine is fully aware of HTML-syntax and will process all user data at the level of a DOM, instead of a string soup. This relegates HTML to a pure serialisation format (which it should be anyway) which is produced by a serialiser whose only job it is to turn an object oriented data structure into text. There's no way to mess up that syntax through bad escaping.

Upvotes: 2

Related Questions