Reputation: 2353
I need to read an external file in ruby.
Running file -i
locally shows
text/plain; charset=utf-16le
I open it in ruby CSV with separater '\t' and a row shows as:
<CSV::Row "\xFF\xFEC\x00a\x00n\x00d\x00i\x00d\x00a\x00t\x00e\x00 \x00n\x00u\
...
row.to_s produces \x000\x000\x000\x001\x00\t\x00E\x00D\x00O
Running puts row
shows the data correctly:
0001 EDOARDO A
...
(the values also show legibly in vim and LibreOffice Calc)
Any suggestions how to get the data in ruby? I've tried various combinations of opening the CSV with external_encoding: 'utf-16le', internal_encoding: "utf-8"
etc., but puts
is the only thing that gives legible values
It also said ASCII-8BIT in ruby CSV.
<#CSV io_type:StringIO encoding:ASCII-8BIT lineno:0 col_sep:"\\t" row_sep:"\n" quote_char:"\"" headers:true>
The file itself was produced as an XLS file. I have uploaded an edited version here (edited i gvim)
Upvotes: 0
Views: 277
Reputation: 2353
The issue was that I was reading from a Paperclip attachment, which needed to have the encoding set (overridden) before saving.
Adding s3_headers in the model worked:
has_attached_file :attachment, s3_headers: lambda { |attachment|
{
'content-Type' => 'text/csv; charset=utf-16le'
}
}
Thanks to Julien for tipping me off that the issue was related to the paperclip attachment (that solution works to read the file directly)
Upvotes: 0
Reputation: 2319
This is working fine for me:
require 'csv'
CSV.foreach("file.xls", encoding: "UTF-16LE:UTF-8", col_sep: "\t") do |row|
puts row.inspect
end
this will produce the following output:
["Candidate number", "First name", "Last name", "Date of birth", "Preparation centre", "Result", "Score", "Reading and Writing", "Listening", "Speaking", "Result enquiry", "Raised on", "Raised by", "Enquiry status", "Withdrawn on", "Withdrawn by", nil]
["0001", "EDOARDO", "AGNEW", "20/01/2001", "Fondazione Istituto Massimo", "RY5-G8-Y2", "-", nil, nil, nil, "-", "00000000", nil, nil, "00000000", nil, nil]
As you can see each row is an array of strings of each column in the document.
Upvotes: 1