What does rb:bom|utf-8 mean in CSV.open in Ruby?

Question

What does the 'rb:bom|utf-8' mean in:

CSV.open(csv_name, 'rb:bom|utf-8', headers: true, return_headers: true) do |csv|

I can understand that:

r means read
bom is a file format with \xEF\xBB\xBF at the start of a file to indicate endianness.
utf-8 is a file format

But:

I don't know how they fits together and why is it necessary to write all these for reading a csv
I'm struggling to find the documentation for this. It doesn't seem to be documented in
https://ruby-doc.org/stdlib-2.6.1/libdoc/csv/rdoc/CSV.html

Update:

Found a very useful documentation: https://ruby-doc.org/core-2.6.3/IO.html#method-c-new-label-Open+Mode

tadman · Accepted Answer

When reading a text file in Ruby you need to specify the encoding or it will revert to the default, which might be wrong.

If you're reading CSV files that are BOM encoded then you need to do it that way.

Pure UTF-8 encoding can't deal with the BOM header so you need to read it and skip past that part before treating the data as UTF-8. That notation is how Ruby expresses that requirement.

What does rb:bom|utf-8 mean in CSV.open in Ruby?

Answers (2)

Related Questions