23tux
23tux

Reputation: 14736

Ruby 2: Detect encoding from binary ASCII-8BIT data

I have to load some data from external sources. When I look at the encoding, Ruby tells me ASCII-8BIT, binary file. However, some of the sources are encoded ISO-8859-1 and some of them are in UTF-8. When I try to convert the ISO-8859-1 encoded stuff to UTF-8, I get an error. But when I do something like content.force_encoding('ISO-8859-1').encode('UTF-8') everything works fine.

However, this doesn't work the other way round. When I try to encode the UTF-8 data to ISO, it ends up with broken characters like .

So, is there a way to detect the "underlying" encoding of the ASCII-8BIT data, and then convert it to UTF-8?

Upvotes: 1

Views: 585

Answers (1)

AJFaraday
AJFaraday

Reputation: 2450

I had a quick google and found the Charlock Holmes gem by Brian Lopez. It looks like it does the detection process you're after.

https://github.com/brianmario/charlock_holmes

Upvotes: 1

Related Questions