Reputation: 607
Node.js developer here who has to work with Ruby, so I'm pretty new to a lot of concepts in Ruby and could use some help.
My use case is that I have to download very large newline delimited JSON files from S3, transform the data, and put it back to S3, all in memory without writing anything to disk.
In Node, I can do something like this:
s3DownloadStream('my-file').pipe(transformStream).pipe(backToS3Stream)
which will transform objects on the fly as they come in and put them to S3 concurrently.
I am having trouble finding a good plan of action to achieve this same behavior in Ruby. I have seen IO.pipe and Celluloid::IO as possible options, but they still don't seem quite like they will be able to do this.
Upvotes: 4
Views: 244
Reputation: 211720
Ruby doesn't have a direct analogue to streams in Node, but it has the Enumerable iterator framework and through that there's the Lazy
option. A lazy enumerator is one that only emits data as necessary, unlike the others that will run to completion each time.
If you set up a lazy chain it will evaluate bit by bit, not all at once.
So your code will look like:
s3_download('my-file').lazy.map do |...|
# transform stream
end.each do |...|
# pipe back to S3
end
Here's a trivial example you can build on:
input = ('a'..'z')
input.lazy.map do |i|
puts 'i=%s' % i
i.upcase
end.each do |j|
puts ' j=%s' % j
end
You can see how each value ripples through the chain individually. If you remove lazy
that's not the case, the first loop runs to completion, buffering into an array, and then the second kicks in and processes that to completion as well.
Node streams are a lot more complicated than this, they can do things like pause/resume, defer an operation without blocking, and more, so there's only so much overlap in terms of functionality. Ruby can do this if you spend the time to use things like fibers and threads, but that's a lot of work.
Upvotes: 1