Reputation:
In my rails app, when I add %dd
or %ff
in url parameter, why it returns invalid byte sequence in UTF-8
?
I have a regex ^[a-zA-Z0-9_]+$
to catch if string includes letters + numbers + underscores only. Then when I add %dd
, or %ff
in my url parameter, it returns invalid byte sequence in UTF-8
error.
What does %dd
and %ff
means?
UPDATE:
My controller:
def search
regex = '^[a-zA-Z0-9_]+$'
@search = params[:search]
unless @search.match(alpha_num_under_regex).nil?
@users = User.find_by_name(@search)
render 'api/v1/users/show', status: 200, formats: :json
else
@users = []
render 'api/v1/users/show', status: 422, formats: :json
end
My URL:
localhost:3000/api/v1/users/show?search=%dd
When params search=%d
it return Bad Request
which is ok. But when I added another d
, search=%dd
or search=a%dd
, it returns Action Controller: Exception caught
- invalid byte sequence in UTF-8
.
The question is, how can I pass invalid byte sequence in UTF-8
error?
Upvotes: 1
Views: 404
Reputation: 121000
From Wiki
:
Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under certain circumstances. Although it is known as URL encoding it is, in fact, used more generally within the main Uniform Resource Identifier (URI) set, which includes both Uniform Resource Locator (URL) and Uniform Resource Name (URN). As such, it is also used in the preparation of data of the
application/x-www-form-urlencoded
media type, as is often used in the submission of HTML form data in HTTP requests.
The query search=%dd
is according to above treated/interpreted as search=<BYTE_WITH_ORD_VALUE_0xDD>
. Ruby expects this string to be UTF-8, but 0xDD
is not a valid UTF-8 symbol.
To avoid this problem and pass what was intended, one should URL-escape the search query explicitly by substituting %
⇒ %25
(the latter is apparently the percent-encoded percent sign itself.)
localhost:3000/api/v1/users/show?search=%25dd
the above will send %dd
query to rails.
NB to be safe, one should build url queries according to the common rule, specified in the article linked above:
[List of reserved characters]
Other characters in a URI must be percent encoded.
Upvotes: 1