Reputation: 360
I'm investigating a bug in our system where a link's title attribute is being set to something akin to click if value > 400
but the actual tooltip being displayed is click if value > 400
. This title value is defined by user input and so the original engineer escaped the text so it wouldn't cause a XSS vulnerability. click if value > 400
becomes click if value > 400
.
This extra escaping step seems to cause HTML special characters to be escaped too much so their escaped values are being rendered literally.
To be extra thorough I checked the HTML spec and according to this line it appears that the setAttribute
function must automatically escape the attribute's value string.
https://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-F68F082
"If an attribute with that name is already present in the element, its value is changed to be that of the value parameter. This value is a simple string; it is not parsed as it is being set. So any markup (such as syntax to be recognized as an entity reference) is treated as literal text, and needs to be appropriately escaped by the implementation when it is written out."
As I understand it, this line means that the setAttribute
function should escape HTML special characters. Is that the correct interpretation?
Upvotes: 5
Views: 3215
Reputation: 724342
The plain English interpretation of that quote is that setAttribute()
does not parse the value as HTML. The reason for that is because you're not writing HTML at all; the value is in plain text, not HTML, so what would normally be special characters in HTML have no special meaning in plain text, and escaping them as though they were HTML would actually be destructive.
>
is the HTML representation of >
. You only need to encode it in HTML, not in plain text.
Upvotes: 4
Reputation: 944202
Not exactly.
HTML is a data format.
Browsers will parse HTML and generate a DOM from it. It is at this point that character references (like >
) get converted to the characters they represent (like >
).
When you use setAttribute
, you directly change the DOM.
This bypasses the HTML data format entirely so the HTML foo="&"
and the JavaScript setAttribute("foo", "&")
will give you the same end result.
Upvotes: 2