Alexander Schimpf
Alexander Schimpf

Reputation: 2392

Is a url-encoded string html-safe (no injection possibilities)?

Let's say I have the following code on my view page (in asp.net mvc 3 razor):

<a href='https://example.com/[email protected](Model.UserInput)'>Click here</a>

Model.UserInput is a user input string that could contain any characters.

Is this totally safe, in terms of html injection and cross site scripting? Or should I HTML encode the query string after URL encoding it as well?

Of course, usually I'd eliminate dangerous input before this stage, but that's not the point.

Upvotes: 1

Views: 4088

Answers (2)

TrueBlue
TrueBlue

Reputation: 490

You must URL encode query string parameters; you cannot and should not pass them in their raw form. The parameters (the user input in your case) could contain special URI characters such as ?, = and &. Without encoding, the user input hansel&gretel becomes https://example.com/search?q=hansel&gretel; the q query string parameter only contains hansel.

In addition, you must HTML encode any HTML; even those fragments that are used inside attrbute values. This is to make sure that if any character that has special meaning in HTML such as <, > and & get converted to their respective HTML entities. Url encoding eliminates most of these cases but you should still make this a habit.

Note that ' by-passes URL encoding!! and it will (I think) also by pass HTML encoding too. Don't use ' as a HTML attribute delimiter or make sure that your HTML encoding function converts ' to &#039; or &apos;.

Upvotes: 4

RobIII
RobIII

Reputation: 8821

Is this totally safe

Maybe it is, maybe it isn't. I would approach this problem with another angle, ignoring safety just for now...

URL encoding serves a purpose: percent-encoding (what it's actual name is) a url. Imagine "url encoding" would replace all spaces with <space width='1'> instead of the actual %20 or whatever the heck it does now. The url "...?q=foo bar" would become, in our imaginary example, "...?q=foo<space width='1'>bar" and be a correctly "url encoded" url. This might be useful in a PDF or CSV file or whatever other type of output you'd be creating, but in HTML this would cause trouble. In your case because of the ' which would "end" the href attribute leaving 1'> as garbage.

<a href='https://example.com/search?q=foo<space width='1'>'>

Because your output is intended for HTML you should actually, IMHO at least, do HTMLEncode(URLEncode(MyUrl)) (pseudocode).

Remember this: escaping is always done within a specific context. For SQL you need some "mysql_real_escape"-alike stuff to escape quotes etc. to avoid SQL injection vulnarabilities. In HTML you need to escape characters like " and <, in an RTF file you would need to escape even other strings/characters like (I don't actually know) \ would become \\ or something similar, in a CSV file you'd need to escape , or ; within a field value and in a JSON output you'd need a string containing a " to be escaped as \". Each type of output(format) needs it's own escaping/encoding.

What you are now doing is "nesting contexts", you're nesting a "url context" in an "HTML context". So you'd have to escape/encode accordingly.

As TrueBlue demonstrates it is not safe.

Upvotes: 1

Related Questions