Reputation: 1735
I'm trying to re-use the following code to create a tar ball:
tarfile = File.open("#{Pathname.new(path).realpath.to_s}.tar","w")
Gem::Package::TarWriter.new(tarfile) do |tar|
Dir[File.join(path, "**/*")].each do |file|
mode = File.stat(file).mode
relative_file = file.sub /^#{Regexp::escape path}\/?/, ''
if File.directory?(file)
tar.mkdir relative_file, mode
else
tar.add_file relative_file, mode do |tf|
File.open(file, "rb") { |f| tf.write f.read }
end
end
end
end
tarfile.rewind
tarfile
It works fine as far as only small folders are involve but anything large will fail with the following error:
Error: Your application used more memory than the safety cap
How can I do it in chunks to avoid the memory problems?
Upvotes: 2
Views: 1248
Reputation: 160571
It looks like the problem could be in this line:
File.open(file, "rb") { |f| tf.write f.read }
You are "slurping" your input file by doing f.read
. slurping means the entire file is being read into memory, which isn't scalable at all, and is the result of using read
without a length.
Instead, I'd do something to read and write the file in blocks so you have a consistent memory usage. This reads in 1MB blocks. You can adjust that for your own needs:
BLOCKSIZE_TO_READ = 1024 * 1000
File.open(file, "rb") do |fi|
while buffer = fi.read(BLOCKSIZE_TO_READ)
tf.write buffer
end
end
Here's what the documentation says about read
:
If length is a positive integer, it try to read length bytes without any conversion (binary mode). It returns nil or a string whose length is 1 to length bytes. nil means it met EOF at beginning. The 1 to length-1 bytes string means it met EOF after reading the result. The length bytes string means it doesn’t meet EOF. The resulted string is always ASCII-8BIT encoding.
An additional problem is it looks like you're not opening the output file correctly:
tarfile = File.open("#{Pathname.new(path).realpath.to_s}.tar","w")
You're writing it in "text" mode because of "w"
. Instead, you need to write in binary mode, "wb"
, because tarballs contain binary (compressed) data:
tarfile = File.open("#{Pathname.new(path).realpath.to_s}.tar","wb")
Rewriting the original code to be more like I'd want to see it, results in:
BLOCKSIZE_TO_READ = 1024 * 1000
def create_tarball(path)
tar_filename = Pathname.new(path).realpath.to_path + '.tar'
File.open(tar_filename, 'wb') do |tarfile|
Gem::Package::TarWriter.new(tarfile) do |tar|
Dir[File.join(path, '**/*')].each do |file|
mode = File.stat(file).mode
relative_file = file.sub(/^#{ Regexp.escape(path) }\/?/, '')
if File.directory?(file)
tar.mkdir(relative_file, mode)
else
tar.add_file(relative_file, mode) do |tf|
File.open(file, 'rb') do |f|
while buffer = f.read(BLOCKSIZE_TO_READ)
tf.write buffer
end
end
end
end
end
end
end
tar_filename
end
BLOCKSIZE_TO_READ
should be at the top of your file since it's a constant and is a "tweakable" - something more likely to be changed than the body of the code.
The method returns the path to the tarball, not an IO handle like the original code. Using the block form of IO.open
automatically closes the output, which would cause any subsequent open
to automatically rewind
. I much prefer passing around path strings than IO handles for files.
I also wrapped some of the method parameters in enclosing parenthesis. While parenthesis aren't required around method parameters in Ruby, and some people eschew them, I think they make the code more maintainable by delimiting where the parameters start and end. They also avoid confusing Ruby when you're passing parameters and a block to a method -- a well-known cause for bugs.
Upvotes: 3
Reputation: 4737
minitar looks like it writes to a stream so I don't think memory will be a problem. Here is the comment and definition of the pack
method (as of May 21, 2013):
# A convenience method to pack files specified by +src+ into +dest+. If
# +src+ is an Array, then each file detailed therein will be packed into
# the resulting Archive::Tar::Minitar::Output stream; if +recurse_dirs+
# is true, then directories will be recursed.
#
# If +src+ is an Array, it will be treated as the argument to Find.find;
# all files matching will be packed.
def pack(src, dest, recurse_dirs = true, &block)
Output.open(dest) do |outp|
if src.kind_of?(Array)
src.each do |entry|
pack_file(entry, outp, &block)
if dir?(entry) and recurse_dirs
Dir["#{entry}/**/**"].each do |ee|
pack_file(ee, outp, &block)
end
end
end
else
Find.find(src) do |entry|
pack_file(entry, outp, &block)
end
end
end
end
Example from the README to write a tar:
# Packs everything that matches Find.find('tests')
File.open('test.tar', 'wb') { |tar| Minitar.pack('tests', tar) }
Example from the README to write a gzipped tar:
tgz = Zlib::GzipWriter.new(File.open('test.tgz', 'wb'))
# Warning: tgz will be closed!
Minitar.pack('tests', tgz)
Upvotes: 1