Trying to unzip a 600mb tgz with ruby gives out of integer range error

Question

Trying to untar a tgz file... with the following code:

tar_extract.each do |entry|
  entry_filename = File.basename(entry.full_name)
  next if entry.directory? # don't unzip directories
  next if !entry.file? # if it's not a file skip  
  next if entry.full_name.starts_with?('/') # another check

  file_path = File.join(working_directory, entry_filename)
  puts "Writing file: #{file_path}"

  File.open(file_path, 'wb') do |f|
    f.write(entry.read)
  end

  bytes = File.size(file_path)

  puts "Successfully wrote file with #{bytes} bytes"
end

tar_extract.close

This code usually works successfully, however when the file within the TGZ is too big, I get a integer out of range error.

Writing file: /files/working_dir/test1.tar.gz  
Successfully wrote file with 244704472 bytes 

Writing file: /files/working_dir/test2.sql
RangeError: integer 2556143960 too big to convert to `int'
from /usr/local/rvm/rubies/ruby-2.1.1/lib/ruby/site_ruby/2.1.0/rubygems/package/tar_reader/entry.rb:126:in `read'

I'm not sure what else I should try.

Looking at the ruby source, this is the code block:

  ##
  # Reads +len+ bytes from the tar file entry, or the rest of the entry if
  # nil

  def read(len = nil)
    check_closed

    return nil if @read >= @header.size

    len ||= @header.size - @read
    max_read = [len, @header.size - @read].min

    ret = @io.read max_read
    @read += ret.size

    ret
  end

Joe · Accepted Answer

You can likely fix this by changing this:

  File.open(file_path, 'wb') do |f|
    f.write(entry.read)
  end

to a loop, where you call entry.read with a parameter, for the max number of bytes to process in that iteration. You might have to split into two calls, as calling entry.read may return nil, indicating there is no more data to process.

Trying to unzip a 600mb tgz with ruby gives out of integer range error

Answers (2)

Related Questions