Reputation: 9407
According to Mac OSX, I have a file with ISO-8859 encoding:
$ file filename.txt
filename.txt: ISO-8859 text, with CRLF line terminators
I try to read it with that encoding:
> filename = "/Users/myuser/Downloads/filename.txt"
> content = File.read(filename, encoding: "ISO-8859")
> content.encoding
=> #<Encoding:UTF-8>
It doesn't work. And consequently:
> content.split("\n")
ArgumentError: invalid byte sequence in UTF-8
Why doesn't it read the file as ISO-8859?
Upvotes: 0
Views: 1326
Reputation: 55758
With your code, Ruby emits the following warning when reading the file:
warning: Unsupported encoding ISO-8859 ignored
This is because there is not only one ISO 8859 encoding but actually quite a bunch of variants. You need to specify the correct one explicitly, e.g
content = File.read(filename, encoding: "ISO-8859-1")
# or equivalently
content = File.read(filename, encoding: Encoding::ISO_8859_1)
When dealing with text files produced in Windows machines (which is hinted by the CRLF line endings), you might want to use Encoding:::Windows_1252
(resp. "Windows-1252"
) instead. This is a superset of ISO 8859-1 and used to be the default encoding used by many Windows programs and the system itself.
Upvotes: 3
Reputation: 1189
Try to use Encoding::ISO_8859_1
instead.
Upvotes: -1