Reputation: 11
I'm working on method in Ruby that will create a tar.gz file that will archive directories and files under a certain path (cdpath), it is expected to be similar to tar -C cdpath -zcf targzfile srcs
, but without changing the CWD (to keep it thread safe). I'm using Gem::Package::TarWriter
to create the Tar object and wrap it with Zlib::GzipWriter
to compress.
Here's what I came up with (this is just a simple standalone test):
require 'rubygems/package'
require 'zlib'
require 'pathname'
require 'find'
cdpath="/absolute/path/to/some/place"
targzfile="test.tar.gz"
src=["some-dir-name-at-cdpath"]
BLOCKSIZE_TO_READ = 1024 * 1000
path = Pathname.new(cdpath)
raise "path #{cdpath} should be an absolute path" unless path.absolute?
raise "path #{cdpath} should be a directory" unless File.directory? cdpath
raise "Destination tar.gz file #{targzfile} already exists" if File.exist? targzfile
raise "no file or directory to tar" if !src || src.length == 0
src.each { |p| p.sub! /^/, "#{cdpath}/" }
File.open targzfile, 'wb' do |otargzfile|
Zlib::GzipWriter.wrap otargzfile do |gz|
Gem::Package::TarWriter.new gz do |tar|
Find.find *src do |f|
relative_path = f.sub "#{cdpath}/", ""
mode = File.stat(f).mode
if File.directory? f
tar.mkdir relative_path, mode
else
File.open f, 'rb' do |rio|
tar.add_file relative_path, mode do |tio|
tio.write rio.read
end
end
end
end
end
end
end
However, I'm hitting the following exception and I can't seem to figure out what I'm doing wrong.
/usr/lib/ruby/2.1.0/rubygems/package/tar_writer.rb:108:in `add_file': Gem::Package::NonSeekableIO (Gem::Package::NonSeekableIO)
from tartest2.rb:29:in `block (5 levels) in <main>'
from tartest2.rb:28:in `open'
from tartest2.rb:28:in `block (4 levels) in <main>'
from /usr/lib/ruby/2.1.0/find.rb:48:in `block (2 levels) in find'
from /usr/lib/ruby/2.1.0/find.rb:47:in `catch'
from /usr/lib/ruby/2.1.0/find.rb:47:in `block in find'
from /usr/lib/ruby/2.1.0/find.rb:42:in `each'
from /usr/lib/ruby/2.1.0/find.rb:42:in `find'
from tartest2.rb:22:in `block (3 levels) in <main>'
from /usr/lib/ruby/2.1.0/rubygems/package/tar_writer.rb:85:in `new'
from tartest2.rb:21:in `block (2 levels) in <main>'
from tartest2.rb:20:in `wrap'
from tartest2.rb:20:in `block in <main>'
from tartest2.rb:19:in `open'
from tartest2.rb:19:in `<main>'
EDIT: I was able to resolve this, by using TarWriter
's add_file_simple
instead of add_file
, the file size needs to be obtained using File.stat
method, details are in the answer below.
Upvotes: 0
Views: 930
Reputation: 11
As described in the OP, the solution is to use add_file_simple
method instead of add_file
, this also requires that you obtain the file size using File.stat
method.
Here's a working method:
# similar as 'tar -C cdpath -zcf targzfile srcs', the difference is 'srcs' is related
# to the current working directory, instead of 'cdpath'
def self.cdtargz(cdpath, targzfile, *src)
path = Pathname.new(cdpath)
raise "path #{cdpath} should be an absolute path" unless path.absolute?
raise "path #{cdpath} should be a directory" unless File.directory? cdpath
raise "Destination tar.gz file #{targzfile} already exists" if File.exist? targzfile
raise "no file or directory to tar" if !src || src.length == 0
src.each { |p| p.sub! /^/, "#{cdpath}/" }
File.open targzfile, 'wb' do |otargzfile|
Zlib::GzipWriter.wrap otargzfile do |gz|
Gem::Package::TarWriter.new gz do |tar|
Find.find *src do |f|
relative_path = f.sub "#{cdpath}/", ""
mode = File.stat(f).mode
size = File.stat(f).size
if File.directory? f
tar.mkdir relative_path, mode
else
tar.add_file_simple relative_path, mode, size do |tio|
File.open f, 'r' do |rio|
tio.write rio.read
end
end
end
end
end
end
end
end
EDIT: After reviewing the answer in this question, I revised the above slightly to avoid "slurping" the files, in my case 95% of the files are quite small, but few very BIG ones, so this makes a lot of sense. Here's the updated version:
BLOCKSIZE_TO_READ = 1024 * 1000
def self.cdtargz(cdpath, targzfile, *src)
path = Pathname.new(cdpath)
raise "path #{cdpath} should be an absolute path" unless path.absolute?
raise "path #{cdpath} should be a directory" unless File.directory? cdpath
raise "Destination tar.gz file #{targzfile} already exists" if File.exist? targzfile
raise "no file or directory to tar" if !src || src.length == 0
src.each { |p| p.sub! /^/, "#{cdpath}/" }
File.open targzfile, 'wb' do |otargzfile|
Zlib::GzipWriter.wrap otargzfile do |gz|
Gem::Package::TarWriter.new gz do |tar|
Find.find *src do |f|
relative_path = f.sub "#{cdpath}/", ""
mode = File.stat(f).mode
size = File.stat(f).size
if File.directory? f
tar.mkdir relative_path, mode
else
tar.add_file_simple relative_path, mode, size do |tio|
File.open f, 'rb' do |rio|
while buffer = rio.read(BLOCKSIZE_TO_READ)
tio.write buffer
end
end
end
end
end
end
end
end
end
Upvotes: 1