ujifgc
ujifgc

Reputation: 2245

How to write BOM marker to a file in Ruby

I have some working code with a crutch to add BOM marker to a new file.

  #writing
  File.open name, 'w', 0644 do |file|
    file.write "\uFEFF"
    file.write @data
  end

  #reading
  File.open name, 'r:bom|utf-8' do |file|
    file.read
  end

Is there any way to automatically add the marker without writing cryptic "\uFEFF" before the data? Something like File.open name, 'w:bom' # this mode has no effect maybe?

Upvotes: 17

Views: 10753

Answers (4)

chinacheng
chinacheng

Reputation: 211

try this

# read content form old file
original_content = File.read(file_path)
# define UTF-8 BOM
bom = "\xEF\xBB\xBF"
# new file,add BOM in the head of content
File.open(new_file_path, "w:UTF-8") do |file|
  file.write(bom + original_content)
end

Upvotes: 0

aaron
aaron

Reputation: 2056

a trimmed version from @knut

File.open("file_utf8.txt", "w:utf-8") do |f|
    f << "\xEF\xBB\xBF".force_encoding("UTF-8")
    f << EXAMPLE_TEXT 
end

Upvotes: 1

knut
knut

Reputation: 27855

**** This answer lead to a new gem: file_with_bom ****

I had the similar problem in the past and I extended File.open with additional encoding variants for the w-mode:

class File
  BOM_LIST_hex = {
      Encoding::UTF_8      => "\xEF\xBB\xBF", #"\uEFBBBF"
      Encoding::UTF_16BE => "\xFE\xFF", #"\uFEFF",
      Encoding::UTF_16LE => "\xFF\xFE",
      Encoding::UTF_32BE => "\x00\x00\xFE\xFF",
      Encoding::UTF_32LE => "\xFE\xFF\x00\x00",
    }
  BOM_LIST_hex.freeze
  def utf_bom_hex(encoding = external_encoding)
    BOM_LIST_hex[encoding]
  end

class << self
  alias :open_old :open
  def open(filename, mode_string = 'r', options = {}, &block)
    #check for bom-flag in mode_string
    options[:bom] = true if mode_string.sub!(/-bom/i,'')

    f = open_old(filename, mode_string, options)
    if options[:bom]
      case mode_string
        #r|bom already standard since 1.9.2
        when /\Ar/   #read mode -> remove BOM
          #remove BOM
          bom = f.read(f.utf_bom_hex.bytesize) 
          #check, if it was really a bom
          if bom != f.utf_bom_hex.force_encoding(bom.encoding)
            f.rewind  #return to position 0 if BOM was no BOM
          end
        when /\Aw/  #write mode -> attach BOM
          f = open_old(filename, mode_string, options)
          f << f.utf_bom_hex.force_encoding(f.external_encoding)
        end #mode_string
    end

    if block_given?
      yield f 
      f.close
    end
  end
  end
end #File

Testcode:

EXAMPLE_TEXT = 'some content öäü'
File.open("file_utf16le.txt", "w:utf-16le|bom"){|f| f << EXAMPLE_TEXT }
File.open("file_utf16le.txt", "r:utf-16le|bom:utf-8"){|f| p f.read }
File.open("file_utf16le.txt", "r:utf-16le:utf-8",  :bom => true ){|f| p f.read }
File.open("file_utf16le.txt", "r:utf-16le:utf-8"){|f| p f.read }

File.open("file_utf8.txt", "w:utf-8", :bom => true ){|f| f << EXAMPLE_TEXT }
File.open("file_utf8.txt", "r:utf-8", :bom => true ){|f| p f.read }
File.open("file_utf8.txt", "r:utf-8|bom",              ){|f| p f.read }
File.open("file_utf8.txt", "r:utf-8",                     ){|f| p f.read }

Some remarks:

  • The code is from pre 1.9-times (but it still works).
  • I used -bom as a bom indicator (ruby 1.9 uses |bom.

Some needed fixes to be better:

  • use |bom instead -bom
  • use the standard r|bom for reading
  • make it ruby 1.8 and 1.9 enabled

Perhaps I will find some time tomorrow to refactor my code and provide it as a gem.

Upvotes: 11

Michael Kohl
Michael Kohl

Reputation: 66837

Alas I think your manual approach is the way to go, at least I don't know a better way:

http://blog.grayproductions.net/articles/miscellaneous_m17n_details

To quote from JEG2's article:

Ruby 1.9 won't automatically add a BOM to your data, so you're going to need to take care of that if you want one. Luckily, it's not too tough. The basic idea is just to print the bytes needed at the beginning of a file.

Upvotes: 5

Related Questions