almo
almo

Reputation: 6367

Postgres invalid byte sequence for encoding "UTF8": 0xc3 0x2f

I work with a payment API and it returns some XML. For logging I want to save the API response in my database.

One word in the API is "manhã" but the API returns "manh�". Other chars like á ou ç are being returned correctly, this is some bug in the API I guess.

But when trying to save this in my DB I get:

Postgres invalid byte sequence for encoding "UTF8": 0xc3 0x2f

How can I solve this?

I tried things like

response.encode("UTF-8") and also force_encode but all I get is:

Encoding::UndefinedConversionError ("\xC3" from ASCII-8BIT to UTF-8)

I need to either remove this wrong character or convert it somehow.

Upvotes: 0

Views: 998

Answers (1)

Mark G.
Mark G.

Reputation: 3260

You’re on the right track - you should be able to solve the problem with the encode method - when the source encoding is known you should be able to simply use:

response.encode(‘UTF-8’, ‘ISO-8859-1’)

There may be times where there are invalid characters in the source encoding, and to get around exceptions, you can instruct ruby how to handle them:

# This will transcode the string to UTF-8 and replace any invalid/undefined characters with ‘’ (empty string)
response.encode(‘UTF-8’, 'ISO-8859-1', invalid: :replace, undef: :replace, replace: ‘’)

This is all laid out in the Ruby docs for String - check them out!

—--

Note, many people incorrectly assume that force_encode will somehow fix encoding problems. force_encode simply tags the string as the specified encoding - it does not transcode and replace/remove the invalid characters. When you're converting between encodings, you must transcode so that characters in one character set are correctly represented in the other character set.

As pointed out in the comment section, you can use force_encoding to transcode your string if you used: response.force_encoding('ISO-8859-1').encode('UTF-8') (which is equivalent to the first example using encode above).

Upvotes: 1

Related Questions