johnnymire
johnnymire

Reputation: 1103

Ruby UTF8 encoding problem

I have a Ruby/Rails app.

I have an artists table in my postgresql database which I want to query by name. I have some artists with portuguese characters etc. and am having some issues querying them.

For example one band is called Legião Urbana. If I query with the string "legiã" from my app I get the following params:

{"action"=>"search_artist", "q"=>"legi\343", "controller"=>"home"}

However I get an error from the query

Artist.all(:conditions => "name LIKE '%#{params[:q]}%'")

PGError: ERROR:  invalid byte sequence for encoding "UTF8": 0xe32527

What should I be doing to convert into UTF8 or fix this from happening somehow?

Upvotes: 2

Views: 9164

Answers (2)

johnnymire
johnnymire

Reputation: 1103

I think this might do it

require 'iconv'
Iconv.conv("UTF8", "LATIN1", params[:q])

Upvotes: 4

yfeldblum
yfeldblum

Reputation: 65435

You need to know what the encoding is of that parameter in the query-string.

Ruby 1.9 includes support for strings tagged with their encodings. In Ruby 1.9, you could:

params[:q].encoding # Rails 3 on 1.9 generally presents strings in UTF-8
params[:q].encode('utf-8') # ask Ruby to re-encode it to UTF-8

Then you need to convert the parameter from that encoding into UTF-8 before doing string-interpolation (#{...} syntax).

Or you need to pass the parameter as a SQL parameter, not using string-interpolation.

Of course, this brings up the security consideration that, unless you know how to properly encode text for usage in SQL, you should never do string-interpolation to build SQL string fragments. Because SQL-fragments with parameters are quick and easy to do in Rails, you should use them.

# Rails 2
Artist.all(:conditions => ['name like ?', "%#{params[:q]}%"])
Artist.all(:conditions => ['name like :q', { :q=> "%#{params[:q]}%" }])

# Rails 3
Artist.where('name like ?', "%#{params[:q]}")
Artist.where('name like :q', :q => "%#{params[:q]}")

SQL injection is the security problem that occurs when you do string-interpolation and encode strings in a way that builds correct SQL fragments for some input strings, but not for others. In languages/frameworks where parameters are more difficult to work with, it would be acceptable to do string-interpolation or string-building (if it remains easy to do string-interpolation or string-building), so long as you research exhaustively how you are required to encode the interpolated strings to build correct SQL fragments, regardless of the input string. Because SQL injection is so easy to avoid with Rails via ordered or named parameters (see the four samples above), you should not have any problems ensuring that your SQL fragments are all safe.

Upvotes: 5

Related Questions