Coder
Coder

Reputation: 410

How to sanitize window objects to prevent reflected XSS attacks in java

I'm writing servlet-based application in which I need to handle the XSS vernability. I've implemented following logic to sanitize input using ESAPI and JSOUP library. For each request I am getting form plain text form parameter. I just want to sanitize it. If malicious content found then throw the exception otherwise continue the request flow.


public class XSSRequestWrapper extends HttpServletRequestWrapper{


public XSSRequestWrapper(HttpServletRequest request){
     super(requrst);

}

@Override
public String[] getParameterValues(String parameter)
{
    
    String[] values= super.getParameterValues(parameter);
    int count = values.length;
    
    for(int i=0; i<couny;i++){
       sanitizeXSS(values[i]);
    }
    
    return values;
}

@Override
public String getParameter(String value)
{
    value=super.getParameter(value);
    sanitizeXSS(values);
    return value;
}

private String sanitizeXSS(String input){

String esapiValue=ESAPI.encoder().conocalize(input, false, false);
esapiValue=esapiValue.replaceAll("\0","");
String unSanitizedStr=Jsoup.clean(esapiValue,Safelist.simpleText());
unSanitizedStr=Parser.unescapeEntities(sanitizedStr,false);

//Comparing above values to find out the xss vulnerability
if(esapiValue!= null || unSanitized!=null
||!esapiValue.equalIgnoreCase(unSanitizedStr)){
    
    throw new RuntimeException("Found malicious content in the user input");
}

return input;
    
}

}

Above code snippet is working fine for all the opening closing tag like

 - <script>alert()</script>
 - <div>....</div>
 - <script>malicious data...

etc etc but its failing for below payload

For testing purpose I'm using payload from https://github.com/payloadbox/xss-payload-list How to solve this issue ?

Upvotes: 0

Views: 1574

Answers (2)

Jonathan Hedley
Jonathan Hedley

Reputation: 10522

What context are you presenting the result in? From your use of unescapeEntities I am guessing that you are presenting this as plain text -- e.g. in a text email body? Or you have another layer in presentation that is re-encoding HTML entities before presenting in HTML? The context matters and could impact what steps are required.

A string like ";alert('XSS');// is potentially dangerous if used unescaped in a HTML attribute.

My suggestion would be - simplify the flow and clarify if the output of the function is meant to be plain text or HTML. If it's plain text, I would do something like:

String getSanitizedPlainText(String inputHtml) {
  String text = Jsoup.parse(inputHtml).body().text(); 
  // or .wholeText() to preserve newlines
  return text;
}

And then the output is cleaned and safe for use in plain text contexts; and if you want to present it in HTML, encode any entities (using e.g. your HTML templating engine).

This pattern doesn't really make sense to me:

String sanitizedStr=Jsoup.clean(esapiValue,Safelist.simpleText());
sanitizedStr=Parser.unescapeEntities(sanitizedStr,false);

As the result of .clean() is HTML and you are then unEscaping. Just skip that double step and use one of the .text() methods instead.


After your edit, it's still not clear to me what content your input is (HTML, or?), and what context you want to display the output in.

I would break down the decision tree as:

1: if your input is HTML, and you want to keep it as HTML and make it safe, use the jsoup HTML Cleaner. You can optionally control what tags and attributes to preserve. The output is HTML. 2: if your input is HTML, but you only want the text content: use a text() method (and if the output context is a HTML body, escape it in your presentation layer) 3: otherwise, if your input is just text, don't do anything on input, and escape it if neccesarry on output.

If you are using multiple methods (like in your original example of using ESAPI, then jsoup HTML cleaning and keeping only textnodes, then unescaping and converting that from HTML to plain text) -- I feel your problem statement or solution design is under-specified and it needs a re-think. I would expect to see only one step, as outlined in the list before.

Or, you need to more crisply define what is "malicious". In your earlier example (which you removed on edit), the strings which were emitted were not dangerous if using in an HTML body context or a plain-text context. That one of the methods changed the input string doesn't make it necessarily malicious, IMV. But you could define what you consider an attack (vs just a string getting escaped or whatever) and additionally scan for that. I.e, consider two distinct passes: one over the input to decide if you think there's an "attack", and a second (which always need to run, regardless of the output of the earlier) to just normalize and sanitize the input, following the decision tree I mentioned above.

Upvotes: 0

Kevin W. Wall
Kevin W. Wall

Reputation: 1462

First off, the best defense against XSS is to use proper contextual output encoding, not HTML sanitization, which is essentially what you are doing here. HTML sanitization is intended when it is not possible to do output encoding because you have a requirement that you must accept certain (HTML) markup. An example would be something like a rich text editor that you often finds in text fields, such as the one that that Stack Overflow uses to accept this answer. And even then, if you must accept markup, it's important that you only accept safe markup. Unfortunately, the way that Jsoup works is rather than recognizing only safe markup, it tries to prevent someone from entering unsafe markup. (That is, it operates as a block-list rather than an allow-list.) And a block-list approach is a game that you cannot possibly win. If you really must use HTML santization, a better than approach would be to use one that uses an allow-list approach. So something like OWASP AntiSamy or the OWASP Java HTML Sanitizer would both be much better and safer choices than Jsoup. That said, if you have some select input that you need cleaned, you can use a combination of ESAPI with AntiSamy via one of the various Validator.getValidSafeHTML methods. As noted in the Javadoc, Validator.getValidSafeHTML

Returns canonicalized and validated "safe" HTML that does not contain unwanted scripts in the body, attributes, CSS, URLs, or anywhere else, any validation exceptions are added to the supplied errorList.

The default behavior of this check depends on the antisamy-esapi.xml configuration. Implementors should reference the OWASP AntiSamy project for ideas on how to do HTML validation in a whitelist way, as this is an extremely difficult problem.

That said, that still is not advised over general contextual output encoding, but it will be better than what you have.

Lastly, I would advise you to thoroughly read through the ESAPI GitHub wiki page "XSS Defense: No Silver Bullets". It will describe why doing what you are attempting (mostly documented under the "Interceptors" section) is an anti-pattern that ought to be avoided except as an absolute last resort.

Upvotes: 0

Related Questions