Inondle
Inondle

Reputation: 360

Does setAttribute automatically escape HTML characters?

I'm investigating a bug in our system where a link's title attribute is being set to something akin to click if value > 400 but the actual tooltip being displayed is click if value > 400. This title value is defined by user input and so the original engineer escaped the text so it wouldn't cause a XSS vulnerability. click if value > 400 becomes click if value > 400.

This extra escaping step seems to cause HTML special characters to be escaped too much so their escaped values are being rendered literally.

To be extra thorough I checked the HTML spec and according to this line it appears that the setAttribute function must automatically escape the attribute's value string.

https://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-F68F082

"If an attribute with that name is already present in the element, its value is changed to be that of the value parameter. This value is a simple string; it is not parsed as it is being set. So any markup (such as syntax to be recognized as an entity reference) is treated as literal text, and needs to be appropriately escaped by the implementation when it is written out."

As I understand it, this line means that the setAttribute function should escape HTML special characters. Is that the correct interpretation?

Upvotes: 5

Views: 3215

Answers (2)

BoltClock
BoltClock

Reputation: 724342

The plain English interpretation of that quote is that setAttribute() does not parse the value as HTML. The reason for that is because you're not writing HTML at all; the value is in plain text, not HTML, so what would normally be special characters in HTML have no special meaning in plain text, and escaping them as though they were HTML would actually be destructive.

> is the HTML representation of >. You only need to encode it in HTML, not in plain text.

Upvotes: 4

Quentin
Quentin

Reputation: 944202

Not exactly.

HTML is a data format.

Browsers will parse HTML and generate a DOM from it. It is at this point that character references (like >) get converted to the characters they represent (like >).

When you use setAttribute, you directly change the DOM.

This bypasses the HTML data format entirely so the HTML foo="&" and the JavaScript setAttribute("foo", "&") will give you the same end result.

Upvotes: 2

Related Questions