Reputation: 18521
Trying to untar a tgz file... with the following code:
tar_extract.each do |entry|
entry_filename = File.basename(entry.full_name)
next if entry.directory? # don't unzip directories
next if !entry.file? # if it's not a file skip
next if entry.full_name.starts_with?('/') # another check
file_path = File.join(working_directory, entry_filename)
puts "Writing file: #{file_path}"
File.open(file_path, 'wb') do |f|
f.write(entry.read)
end
bytes = File.size(file_path)
puts "Successfully wrote file with #{bytes} bytes"
end
tar_extract.close
This code usually works successfully, however when the file within the TGZ is too big, I get a integer out of range error.
Writing file: /files/working_dir/test1.tar.gz
Successfully wrote file with 244704472 bytes
Writing file: /files/working_dir/test2.sql
RangeError: integer 2556143960 too big to convert to `int'
from /usr/local/rvm/rubies/ruby-2.1.1/lib/ruby/site_ruby/2.1.0/rubygems/package/tar_reader/entry.rb:126:in `read'
I'm not sure what else I should try.
Looking at the ruby source, this is the code block:
##
# Reads +len+ bytes from the tar file entry, or the rest of the entry if
# nil
def read(len = nil)
check_closed
return nil if @read >= @header.size
len ||= @header.size - @read
max_read = [len, @header.size - @read].min
ret = @io.read max_read
@read += ret.size
ret
end
Upvotes: 1
Views: 189
Reputation: 18521
Using Joe's guidance I was able to figure it out.
I changed the File
block to:
File.open(file_path, 'wb') do |f|
while !entry.eof?
f.write(entry.read(16000)) # 16 KB
end
end
The reason why I choose 16KB, is because I performed a bunch of benchmark's
b = Benchmark.measure do
File.open(file_path, 'wb') do |f|
while !entry.eof?
f.write(entry.read(16000)) # 16 KB
end
end
end
bytes = File.size(file_path)
puts("Successfully wrote file with #{bytes} bytes in #{b.real}")
After doing some research, it seems each disk has there own optimal chunk size. I had two files I used for a benchmark, a file with 211mb
and one with 6.6gb
. Results below, but it turned out 16KB - 64KB was the most optimal range for my disk.
2 gb // 2047483648
Successfully wrote file with 7021620216 bytes in 60.360527059
Successfully wrote file with 220613778 bytes in 2.084798686
1 gb // 1073741824
Successfully wrote file with 7021620216 bytes in 42.345642806
Successfully wrote file with 7021620216 bytes in 48.941375145
Successfully wrote file with 7021620216 bytes in 51.501044608
Successfully wrote file with 7021620216 bytes in 58.81474911
Successfully wrote file with 220613778 bytes in 1.57968424
Successfully wrote file with 220613778 bytes in 2.28171993
Successfully wrote file with 220613778 bytes in 5.905203041
Successfully wrote file with 220613778 bytes in 16.944126945
4KB // 4000
Successfully wrote file with 7021620216 bytes in 43.39409191
Successfully wrote file with 7021620216 bytes in 44.572620161
Successfully wrote file with 7021620216 bytes in 48.510513964
Successfully wrote file with 7021620216 bytes in 53.839022034
Successfully wrote file with 220613778 bytes in 1.982647292
Successfully wrote file with 220613778 bytes in 2.071772595
Successfully wrote file with 220613778 bytes in 2.132004983
Successfully wrote file with 220613778 bytes in 2.221654993
8KB // 8000
Successfully wrote file with 7021620216 bytes in 41.851550514
Successfully wrote file with 7021620216 bytes in 45.611952667
Successfully wrote file with 7021620216 bytes in 50.068614205
Successfully wrote file with 7021620216 bytes in 50.726276706
Successfully wrote file with 220613778 bytes in 1.941246687
Successfully wrote file with 220613778 bytes in 2.456356439
Successfully wrote file with 220613778 bytes in 2.56323527
Successfully wrote file with 220613778 bytes in 3.756049832
16KB // 16000
Successfully wrote file with 7021620216 bytes in 36.929413152
Successfully wrote file with 7021620216 bytes in 36.486866289
Successfully wrote file with 7021620216 bytes in 36.743103326
Successfully wrote file with 7021620216 bytes in 37.019910405
Successfully wrote file with 220613778 bytes in 1.504792162
Successfully wrote file with 220613778 bytes in 1.620161067
Successfully wrote file with 220613778 bytes in 1.622070414
Successfully wrote file with 220613778 bytes in 1.698627821
32kB // 32000
Successfully wrote file with 7021620216 bytes in 35.802759912
Successfully wrote file with 7021620216 bytes in 38.775857377
Successfully wrote file with 7021620216 bytes in 39.116311496
Successfully wrote file with 7021620216 bytes in 39.126005469
Successfully wrote file with 220613778 bytes in 1.696821094
Successfully wrote file with 220613778 bytes in 1.773727215
Successfully wrote file with 220613778 bytes in 4.023144931
Successfully wrote file with 220613778 bytes in 4.08615266
64kb // 64000
Successfully wrote file with 7021620216 bytes in 36.732343382
Successfully wrote file with 7021620216 bytes in 37.914365658
Successfully wrote file with 7021620216 bytes in 38.336098907
Successfully wrote file with 7021620216 bytes in 39.146334479
Successfully wrote file with 220613778 bytes in 1.662487522
Successfully wrote file with 220613778 bytes in 1.674177939
Successfully wrote file with 220613778 bytes in 1.745556917
Successfully wrote file with 220613778 bytes in 1.784492717
Upvotes: 0
Reputation: 42646
You can likely fix this by changing this:
File.open(file_path, 'wb') do |f|
f.write(entry.read)
end
to a loop, where you call entry.read
with a parameter, for the max number of bytes to process in that iteration. You might have to split into two calls, as calling entry.read
may return nil, indicating there is no more data to process.
Upvotes: 1