Reputation: 2392
Let's say I have the following code on my view page (in asp.net mvc 3 razor):
<a href='https://example.com/[email protected](Model.UserInput)'>Click here</a>
Model.UserInput
is a user input string that could contain any characters.
Is this totally safe, in terms of html injection and cross site scripting? Or should I HTML encode the query string after URL encoding it as well?
Of course, usually I'd eliminate dangerous input before this stage, but that's not the point.
Upvotes: 1
Views: 4088
Reputation: 490
You must URL encode query string parameters; you cannot and should not pass them in their raw form. The parameters (the user input in your case) could contain special URI characters such as ?
, =
and &
. Without encoding, the user input hansel&gretel
becomes https://example.com/search?q=hansel&gretel
; the q
query string parameter only contains hansel
.
In addition, you must HTML encode any HTML; even those fragments that are used inside attrbute values. This is to make sure that if any character that has special meaning in HTML such as <
, >
and &
get converted to their respective HTML entities. Url encoding eliminates most of these cases but you should still make this a habit.
Note that '
by-passes URL encoding!! and it will (I think) also by pass HTML encoding too. Don't use '
as a HTML attribute delimiter or make sure that your HTML encoding function converts '
to '
or '
.
Upvotes: 4
Reputation: 8821
Is this totally safe
Maybe it is, maybe it isn't. I would approach this problem with another angle, ignoring safety just for now...
URL encoding serves a purpose: percent-encoding (what it's actual name is) a url. Imagine "url encoding" would replace all spaces with <space width='1'>
instead of the actual %20
or whatever the heck it does now. The url "...?q=foo bar
" would become, in our imaginary example, "...?q=foo<space width='1'>bar
" and be a correctly "url encoded" url. This might be useful in a PDF or CSV file or whatever other type of output you'd be creating, but in HTML this would cause trouble. In your case because of the '
which would "end" the href
attribute leaving 1'>
as garbage.
<a href='https://example.com/search?q=foo<space width='1'>'>
Because your output is intended for HTML you should actually, IMHO at least, do HTMLEncode(URLEncode(MyUrl))
(pseudocode).
Remember this: escaping is always done within a specific context. For SQL you need some "mysql_real_escape"-alike stuff to escape quotes etc. to avoid SQL injection vulnarabilities. In HTML you need to escape characters like "
and <
, in an RTF file you would need to escape even other strings/characters like (I don't actually know) \
would become \\
or something similar, in a CSV file you'd need to escape ,
or ;
within a field value and in a JSON output you'd need a string containing a "
to be escaped as \"
. Each type of output(format) needs it's own escaping/encoding.
What you are now doing is "nesting contexts", you're nesting a "url context" in an "HTML context". So you'd have to escape/encode accordingly.
As TrueBlue demonstrates it is not safe.
Upvotes: 1