Reputation: 21
In my quest to understand ruby's enumerable
, I have something similar to the following
FileReader.read(very_big_file)
.lazy
.flat_map {|line| get_array_of_similar_words } # array.size is ~10
.each_slice(100) # wait for 100 items
.map{|array| process_100_items}
As much as each flat_map
call emits an array of ~10 items, I was expecting the each_slice
call to batch the items in 100's but that is not the case. I.e wait until there are 100 items before passing them to the final .map
call.
How do I achieve functionality similar to the buffer function in reactive programming?
Upvotes: 0
Views: 2314
Reputation: 110675
To see how lazy
affects the calculations, let's look at an example. First construct a file:
str =<<~_
Now is the
time for all
good Ruby coders
to come to
the aid of
their bowling
team
_
fname = 't'
File.write(fname, str)
#=> 82
and specify the slice size:
slice_size = 4
Now I will read lines, one-by-one, split the lines into words, remove duplicate words and then append those words to an array. As soon as the array contains at least 4 words I will take the first four and map them into the longest word of the 4. The code to do that follows. To show how the calculations progress I will salt the code with puts
statements. Note that IO::foreach without a block returns an enumerator.
IO.foreach(fname).
lazy.
tap { |o| puts "o1 = #{o}" }.
flat_map { |line|
puts "line = #{line}"
puts "line.split.uniq = #{line.split.uniq} "
line.split.uniq }.
tap { |o| puts "o2 = #{o}" }.
each_slice(slice_size).
tap { |o| puts "o3 = #{o}" }.
map { |arr|
puts "arr = #{arr}, arr.max = #{arr.max_by(&:size)}"
arr.max_by(&:size) }.
tap { |o| puts "o3 = #{o}" }.
to_a
#=> ["time", "good", "coders", "bowling", "team"]
The following is displayed:
o1 = #<Enumerator::Lazy:0x00005992b1ab6970>
o2 = #<Enumerator::Lazy:0x00005992b1ab6880>
o3 = #<Enumerator::Lazy:0x00005992b1ab6678>
o3 = #<Enumerator::Lazy:0x00005992b1ab6420>
line = Now is the
line.split.uniq = ["Now", "is", "the"]
line = time for all
line.split.uniq = ["time", "for", "all"]
arr = ["Now", "is", "the", "time"], arr.max = time
line = good Ruby coders
line.split.uniq = ["good", "Ruby", "coders"]
arr = ["for", "all", "good", "Ruby"], arr.max = good
line = to come to
line.split.uniq = ["to", "come"]
line = the aid of
line.split.uniq = ["the", "aid", "of"]
arr = ["coders", "to", "come", "the"], arr.max = coders
line = their bowling
line.split.uniq = ["their", "bowling"]
arr = ["aid", "of", "their", "bowling"], arr.max = bowling
line = team
line.split.uniq = ["team"]
arr = ["team"], arr.max = team
If the line lazy.
is removed the return value is the same but the following is displayed (.to_a
at the end now being superfluous):
o1 = #<Enumerator:0x00005992b1a438f8>
line = Now is the
line.split.uniq = ["Now", "is", "the"]
line = time for all
line.split.uniq = ["time", "for", "all"]
line = good Ruby coders
line.split.uniq = ["good", "Ruby", "coders"]
line = to come to
line.split.uniq = ["to", "come"]
line = the aid of
line.split.uniq = ["the", "aid", "of"]
line = their bowling
line.split.uniq = ["their", "bowling"]
line = team
line.split.uniq = ["team"]
o2 = ["Now", "is", "the", "time", "for", "all", "good", "Ruby",
"coders", "to", "come", "the", "aid", "of", "their",
"bowling", "team"]
o3 = #<Enumerator:0x00005992b1a41a08>
arr = ["Now", "is", "the", "time"], arr.max = time
arr = ["for", "all", "good", "Ruby"], arr.max = good
arr = ["coders", "to", "come", "the"], arr.max = coders
arr = ["aid", "of", "their", "bowling"], arr.max = bowling
arr = ["team"], arr.max = team
o3 = ["time", "good", "coders", "bowling", "team"]
Upvotes: 3