mwilliams
mwilliams

Reputation: 9978

Handing Non-UTF8 content in my Rails application appropriately

I have a Rails application that allows users to import information from various sources using RSS feeds and such. My default encoding on the database is UTF8 and I've been receiving a lot of exceptions in regards to non-UTF8 data that is coming through the system and crashing once it hits the database.

I'm to appropriately detect the non-UTF8 data using the is_utf8? method on the attributes before a save is done, but I haven't come up with a way to handle it. I've seen iconv to convert but it appears that requires being able to determine what kind of encoding I'm converting from.

Is there a simple way to do a guess conversion or possibly just strip out the non-UTF8 characters and then do the save into the database?

Thanks!

Upvotes: 2

Views: 1294

Answers (2)

bobince
bobince

Reputation: 536359

How is non-UTF-8 data making it into the system? Make sure all your pages are served as Content-Type text/html;charset=utf-8 and browsers will always submit UTF-8 data to your forms.

(Of course that still leaves things like mail and uploaded files, but a lot of those kinds of specific context often give you an encoding to go on.)

Upvotes: 1

pantulis
pantulis

Reputation: 1705

Iconv is your friend when it comes to switch encodings. To detect encodings there's a little gem available: rchardet We have used it to detect Asian encodings in an attempt to block spam and it worked fine.

Upvotes: 1

Related Questions