Reputation: 1835
I am rewriting a piece of Ruby code found on github.com into Javascript. I am not having any problems understanding the code, except for the part below. The question is: how does the 'loop do' end if there are no 'break's?
def utf8_bytes(record_size, record_tag)
Enumerator.new do |yielder|
bits = compressed_bits record_size, record_tag
loop do
# Use the Huffman tree to decode the first character.
node = tree_root
while node < 0x100
# p ['node', node]
bit = bits.next
# p ['bit', bit]
node = (bit == 0) ? tree_left[node] : tree_right[node]
end
first_byte = node - 0x100
# p ['utf8 start', first_byte]
yielder << first_byte
# The other characters are 10xxxxxx, where x'es are raw bits.
2.upto utf8_char_bytes(first_byte) do
byte = 0b10
6.times do
byte = (byte << 1) | bits.next
end
# p ['utf8 byte', byte]
yielder << byte
end
end
end
end
Update
Thanks for all the answers, but unfortunatly I still don't understand what is really happening. If I understand correctly, it is like a bucket. Every time you put something into it, it is being processed. And 'loop do' is done as many times as there are bytes that are put into it.
The function is calles only once, like so:
text = utf8_bytes(record_size, record_tag).to_a.pack('C*')
But this is inside a Enumerator as well, so I guess the bytes drip from one bucket into the other.
In any case. I have translated the function into Javascript. Maybe someone can tell me if this is correct? (leaving aside that the Javascript function returns an array, and leaving aside that using arrays like this is probably not very inefficient)
function utf8_bytes( record_size, record_tag ) {
var yielder = new Array();
bits = compressed_bits( record_size, record_tag );
// compressed_bits returns an array of 0's and 1's
var v=0;
while( v<bits.length ) {
// # Use the Huffman tree to decode the first character.
var node = tree_root;
while ( node < 0x100 ) {
// # p ['node', node]
bit = bits[v++];
// # p ['bit', bit]
node = (bit == 0) ? tree_left[node] : tree_right[node];
}
var first_byte = node - 0x100;
// # p ['utf8 start', first_byte]
yielder.push( first_byte );
// # The other characters are 10xxxxxx, where x'es are raw bits.
for (var m=2; m<=utf8_char_bytes(first_byte); m++ ){
var mbyte = 2;
for (var n=0; n<6; n++ ) {
mbyte = (mbyte << 1) | bits[v++];
}
// # p ['utf8 byte', mbyte]
yielder.push( mbyte );
}
}
return( yielder );
}
Upvotes: 1
Views: 131
Reputation: 79622
The loop never ends by itself, i.e. this method returns an infinite enumerator.
utf8_bytes(...).to_a # => never ends
These can be very useful, as the block you call them with can return before consuming the whole (infinite) enumerator:
def foo
utf8_bytes(...).each do |byte|
return byte if is_it_what_youre_looking_for?(byte)
end
# You'll never get here!
end
In a similar fashion, is useful to get just a couple of values. For example:
utf8_bytes(...).first(100) # => array of length 100
To play around with a simpler "infinite" enumerator, you can use 0..Float::INFINITY
instead of calling utf8_bytes
.
Upvotes: 0
Reputation: 35541
In Enumerator::Yielder
, the yield
method is aliased as <<
. So calling:
yielder << some_byte
is the same as:
yielder.yield some_byte
Calling yield
blocks the control flow. Control can return when next
(or an equivalent c function) is called on the Enumerator object. If next
is never called, the loop will not continue, and will remain in that state until the Enumerator falls out of scope and is garbage collected.
You can read up on the Enumerator
class for more info.
Upvotes: 1
Reputation: 11061
It appears to be an enumerator (Note the Enumerator.new do |yielder|
.) My guess is control flow returns every time the append operator (<<
) is applied to yielder
.
Upvotes: 1