Danny Kulchinsky
Danny Kulchinsky

Reputation: 11

Create a tar.gz with contens of a specific path (without chdir) with Ruby

I'm working on method in Ruby that will create a tar.gz file that will archive directories and files under a certain path (cdpath), it is expected to be similar to tar -C cdpath -zcf targzfile srcs, but without changing the CWD (to keep it thread safe). I'm using Gem::Package::TarWriter to create the Tar object and wrap it with Zlib::GzipWriter to compress.

Here's what I came up with (this is just a simple standalone test):

require 'rubygems/package'
require 'zlib'
require 'pathname'
require 'find'

cdpath="/absolute/path/to/some/place"
targzfile="test.tar.gz"
src=["some-dir-name-at-cdpath"]

BLOCKSIZE_TO_READ = 1024 * 1000

path = Pathname.new(cdpath)
raise "path #{cdpath} should be an absolute path" unless path.absolute?
raise "path #{cdpath} should be a directory" unless File.directory? cdpath
raise "Destination tar.gz file #{targzfile} already exists" if File.exist? targzfile
raise "no file or directory to tar" if !src || src.length == 0

src.each { |p| p.sub! /^/, "#{cdpath}/" }
File.open targzfile, 'wb' do |otargzfile|
  Zlib::GzipWriter.wrap otargzfile do |gz|
    Gem::Package::TarWriter.new gz do |tar|
      Find.find *src do |f|
        relative_path = f.sub "#{cdpath}/", ""
        mode = File.stat(f).mode
        if File.directory? f
          tar.mkdir relative_path, mode
        else
          File.open f, 'rb' do |rio|
            tar.add_file relative_path, mode do |tio|
              tio.write rio.read
            end
          end
        end
      end
    end
  end
end

However, I'm hitting the following exception and I can't seem to figure out what I'm doing wrong.

/usr/lib/ruby/2.1.0/rubygems/package/tar_writer.rb:108:in `add_file': Gem::Package::NonSeekableIO (Gem::Package::NonSeekableIO)
        from tartest2.rb:29:in `block (5 levels) in <main>'
        from tartest2.rb:28:in `open'
        from tartest2.rb:28:in `block (4 levels) in <main>'
        from /usr/lib/ruby/2.1.0/find.rb:48:in `block (2 levels) in find'
        from /usr/lib/ruby/2.1.0/find.rb:47:in `catch'
        from /usr/lib/ruby/2.1.0/find.rb:47:in `block in find'
        from /usr/lib/ruby/2.1.0/find.rb:42:in `each'
        from /usr/lib/ruby/2.1.0/find.rb:42:in `find'
        from tartest2.rb:22:in `block (3 levels) in <main>'
        from /usr/lib/ruby/2.1.0/rubygems/package/tar_writer.rb:85:in `new'
        from tartest2.rb:21:in `block (2 levels) in <main>'
        from tartest2.rb:20:in `wrap'
        from tartest2.rb:20:in `block in <main>'
        from tartest2.rb:19:in `open'
        from tartest2.rb:19:in `<main>'

EDIT: I was able to resolve this, by using TarWriter's add_file_simple instead of add_file, the file size needs to be obtained using File.stat method, details are in the answer below.

Upvotes: 0

Views: 930

Answers (1)

Danny Kulchinsky
Danny Kulchinsky

Reputation: 11

As described in the OP, the solution is to use add_file_simple method instead of add_file, this also requires that you obtain the file size using File.stat method.

Here's a working method:

  # similar as 'tar -C cdpath -zcf targzfile srcs', the difference is 'srcs' is related
  # to the current working directory, instead of 'cdpath'
  def self.cdtargz(cdpath, targzfile, *src)
    path = Pathname.new(cdpath)
    raise "path #{cdpath} should be an absolute path" unless path.absolute?
    raise "path #{cdpath} should be a directory" unless File.directory? cdpath
    raise "Destination tar.gz file #{targzfile} already exists" if File.exist? targzfile
    raise "no file or directory to tar" if !src || src.length == 0

    src.each { |p| p.sub! /^/, "#{cdpath}/" }
    File.open targzfile, 'wb' do |otargzfile|
      Zlib::GzipWriter.wrap otargzfile do |gz|
        Gem::Package::TarWriter.new gz do |tar|
          Find.find *src do |f|
            relative_path = f.sub "#{cdpath}/", ""
            mode = File.stat(f).mode
            size = File.stat(f).size
            if File.directory? f
              tar.mkdir relative_path, mode
            else
              tar.add_file_simple relative_path, mode, size do |tio|
                File.open f, 'r' do |rio|
                  tio.write rio.read
                end
              end
            end
          end
        end
      end
    end
  end

EDIT: After reviewing the answer in this question, I revised the above slightly to avoid "slurping" the files, in my case 95% of the files are quite small, but few very BIG ones, so this makes a lot of sense. Here's the updated version:

  BLOCKSIZE_TO_READ = 1024 * 1000

  def self.cdtargz(cdpath, targzfile, *src)
    path = Pathname.new(cdpath)
    raise "path #{cdpath} should be an absolute path" unless path.absolute?
    raise "path #{cdpath} should be a directory" unless File.directory? cdpath
    raise "Destination tar.gz file #{targzfile} already exists" if File.exist? targzfile
    raise "no file or directory to tar" if !src || src.length == 0

    src.each { |p| p.sub! /^/, "#{cdpath}/" }
    File.open targzfile, 'wb' do |otargzfile|
      Zlib::GzipWriter.wrap otargzfile do |gz|
        Gem::Package::TarWriter.new gz do |tar|
          Find.find *src do |f|
            relative_path = f.sub "#{cdpath}/", ""
            mode = File.stat(f).mode
            size = File.stat(f).size
            if File.directory? f
              tar.mkdir relative_path, mode
            else
              tar.add_file_simple relative_path, mode, size do |tio|
                File.open f, 'rb' do |rio|
                  while buffer = rio.read(BLOCKSIZE_TO_READ)
                    tio.write buffer
                  end
                end
              end
            end
          end
        end
      end
    end
  end

Upvotes: 1

Related Questions