Reputation: 8461
My company has data messages (json) stored in gzipped files on Amazon S3. I want to use Ruby to iterate through the files and do some analytics. I started to use the 'aws/s3' gem, and get get each file as an object:
#<AWS::S3::S3Object:0x4xxx4760 '/my.company.archive/data/msg/20131030093336.json.gz'>
But once I have this object, I do not know how to unzip it or even access the data inside of it.
Upvotes: 2
Views: 3247
Reputation: 51
For me the below set of steps worked:
file_path = "/tmp/gz/x.csv.gz"
File.open(file_path, mode="wb") do |f|
s3_client.get_object(bucket: bucket, key: key) do |gzfiledata|
f.write gzfiledata
end
end
data = []
Zlib::GzipReader.open(file_path) do |gz_reader|
csv_reader = ::FastestCSV.new(gz_reader)
csv_reader.each do |csv|
data << csv
end
end
Upvotes: 1
Reputation: 114
The S3Object
documentation is updated and the stream
method is no longer available: https://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html
So, the best way to read data from an S3 object would be this:
json_data = Zlib::GzipReader.new(StringIO.new(your_object.read)).read
Upvotes: 0
Reputation: 2750
You can see the documentation for S3Object
here: http://amazon.rubyforge.org/doc/classes/AWS/S3/S3Object.html.
You can fetch the content by calling your_object.value
; see if you can get that far. Then it should be a question of unpacking the gzip blob. Zlib
should be able to handle that.
I'm not sure if .value
returns you a big string of binary data or an IO object. If it's a string, you can wrap it in a StringIO
object to pass it to Zlib::GzipReader.new
, e.g.
json_data = Zlib::GzipReader.new(StringIO.new(your_object.value)).read
S3Object
has a stream
method, which I would hope behaves like a IO object (I can't test that here, sorry). If so, you could do this:
json_data = Zlib::GzipReader.new(your_object.stream).read
Once you have the unzipped json content, you can just call JSON.parse
on it, e.g.
JSON.parse Zlib::GzipReader.new(StringIO.new(your_object.value)).read
Upvotes: 1