Daniel Viglione
Daniel Viglione

Reputation: 9407

Reading a file with ISO-8859 encoding

According to Mac OSX, I have a file with ISO-8859 encoding:

$ file filename.txt
filename.txt: ISO-8859 text, with CRLF line terminators

I try to read it with that encoding:

> filename = "/Users/myuser/Downloads/filename.txt"
> content = File.read(filename, encoding: "ISO-8859")          
> content.encoding
 => #<Encoding:UTF-8> 

It doesn't work. And consequently:

 > content.split("\n")
ArgumentError: invalid byte sequence in UTF-8

Why doesn't it read the file as ISO-8859?

Upvotes: 0

Views: 1326

Answers (2)

Holger Just
Holger Just

Reputation: 55758

With your code, Ruby emits the following warning when reading the file:

warning: Unsupported encoding ISO-8859 ignored

This is because there is not only one ISO 8859 encoding but actually quite a bunch of variants. You need to specify the correct one explicitly, e.g

content = File.read(filename, encoding: "ISO-8859-1")
# or equivalently
content = File.read(filename, encoding: Encoding::ISO_8859_1)

When dealing with text files produced in Windows machines (which is hinted by the CRLF line endings), you might want to use Encoding:::Windows_1252 (resp. "Windows-1252") instead. This is a superset of ISO 8859-1 and used to be the default encoding used by many Windows programs and the system itself.

Upvotes: 3

Try to use Encoding::ISO_8859_1 instead.

Upvotes: -1

Related Questions