Reputation: 1314
I have a directory with 100+ zip files and I need to read the files inside the zip files to do some data processing, without unzipping the archive.
Is there a Ruby library to read the contents of files in zip archives, without unzipping the file?
Using rubyzip gives an error:
require 'zip'
Zip::File.open('my_zip.zip') do |zip_file|
# Handle entries one by one
zip_file.each do |entry|
# Extract to file/directory/symlink
puts "Extracting #{entry.name}"
entry.extract('here')
# Read into memory
content = entry.get_input_stream.read
end
end
Gives this error:
test.rb:12:in `block (2 levels) in <main>': undefined method `read' for Zip::NullInputStream:Module (NoMethodError)
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:42:in `call'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:42:in `block in each'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:41:in `each'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:41:in `each'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/central_directory.rb:182:in `each'
from test.rb:6:in `block in <main>'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/file.rb:99:in `open'
from test.rb:4:in `<main>'
Upvotes: 10
Views: 13676
Reputation: 14625
The Zip::NullInputStream
is returned if the entry is a directory and not a file, could that be the case?
Here's a more robust variation of the code:
#!/usr/bin/env ruby
require 'rubygems'
require 'zip'
Zip::File.open('my_zip.zip') do |zip_file|
# Handle entries one by one
zip_file.each do |entry|
if entry.directory?
puts "#{entry.name} is a folder!"
elsif entry.symlink?
puts "#{entry.name} is a symlink!"
elsif entry.file?
puts "#{entry.name} is a regular file!"
# Read into memory
entry.get_input_stream { |io| content = io.read }
# Output
puts content
else
puts "#{entry.name} is something unknown, oops!"
end
end
end
Upvotes: 21
Reputation: 4096
I came across the same issue and checking for if entry.file?
, before entry.get_input_stream.read
, resolved the issue.
require 'zip'
Zip::File.open('my_zip.zip') do |zip_file|
# Handle entries one by one
zip_file.each do |entry|
# Extract to file/directory/symlink
puts "Extracting #{entry.name}"
entry.extract('here')
# Read into memory
if entry.file?
content = entry.get_input_stream.read
end
end
end
Upvotes: 3