Reputation: 8305
Given the following two pieces of code:
def hello(z)
"hello".gsub(/(o)/, &z)
end
z = proc {|m| p $1}
hello(z)
# prints: nil
def hello
z = proc {|m| p $1}
"hello".gsub(/(o)/, &z)
end
hello
# prints: "o"
Why are the outputs of these two pieces of code different? Is there a way to pass a block to gsub
from outside of the method definition so that the variables $1
, $2
would be evaluated in the same way as if the block was given inside the method definition?
Upvotes: 9
Views: 1470
Reputation: 1830
Thanks @masa-sakano for the detailed background answer.
However, all existing answers don't work correctly with gsub
because they only pass the first match data to the block (either directly as an argument, or indirectly via the binding), while gsub
operates on all occurrences of the pattern in the string.
Here's a gsub
wrapper built on top of @masa-sakano's answer that works correctly. It calls the given blk
in another block after setting the current match data for each iteration of gsub
:
def gsub_wrapper(str, re, blk)
str.gsub(re) do |m|
blk.binding.tap do |b|
b.local_variable_set(:_, $~)
b.eval("$~=_")
end
blk.call(m)
end
end
Incidentally, it's also a solution to this RUBY PUZZLER: GSUB, BLOCKS, AND PROCS:
$ cat <<-'EOF' > ruby-puzzler-gsub-blocks-and-procs.rb
str = "hello world"
upc = Proc.new {|m| $1.upcase}
puts str.gsub(/([aeiou])/, &upc)
puts str.gsub(/(\w)/, &upc)
def doit(str, re, blk)
puts str.gsub(re, &blk)
end
doit "hello world", /([aeiou])/, upc
doit "hello world", /(\w)/, upc
def gsub_wrapper(str, re, blk)
str.gsub(re) do |m|
blk.binding.tap do |b|
b.local_variable_set(:_, $~)
b.eval("$~=_")
end
blk.call(m)
end
end
puts gsub_wrapper(str, /([aeiou])/, upc)
puts gsub_wrapper(str, /(\w)/, upc)
EOF
$ ruby ruby-puzzler-gsub-blocks-and-procs.rb
hEllO wOrld
HELLO WORLD
hDllD wDrld
DDDDD DDDDD
hEllO wOrld
HELLO WORLD
Upvotes: 0
Reputation: 2267
Here is a workaround (Ruby 2). The given Proc z
behaves exactly as the block given to String#gsub
.
def hello(z)
"hello".match /(o)/ # Sets $1, $2, ...
z.binding.tap do |b|
b.local_variable_set(:_, $~)
b.eval("$~=_")
end
"hello".gsub(/(o)/, &z)
end
z = proc {|m| p $1}
hello(z)
# prints: "o"
The background is explained in detail in this answer to the question "How to pass Regexp.last_match to a block in Ruby" (posted in 2018).
Upvotes: 2
Reputation: 8898
Things like $1
, $2
acts like LOCAL VARIABLES, despite its leading $
. You can try the code below to prove this:
def foo
/(hell)o/ =~ 'hello'
$1
end
def bar
$1
end
foo #=> "hell"
bar #=> nil
Your problem is because the proc z
is defined outside the method hello
, so z
accesses the $1
in the context of main
, but gsub
sets the $1
in the context of method hello
.
Upvotes: 3
Reputation: 4489
The two versions are different because the $1
variable is thread-local and method-local. In the first example, $1
only exists in the block outside the hello
method. In the second example, $1
exists inside the hello
method.
There is no way to pass $1 in a block to gsub from outside of the method definition.
Note that gsub
passes the match string into the block, so z = proc { |m| pp m }
will only work as long as your regular expression only contains the whole match. As soon as your regular expression contains anything other than the reference you want, you're out of luck.
For example, "hello".gsub(/l(o)/) { |m| m }
=> hello
, because the whole match string was passed to the block.
Whereas, "hello".gsub(/l(o)/) { |m| $1 }
=> helo
, because the l
that was matched is discarded by the block, all we are interested in is the captured o
.
My solution is to match
the regular expression, then pass the MatchData
object into the block:
require 'pp'
def hello(z)
string = "hello"
regex = /(o)/
m = string.match(regex)
string.gsub(regex, z.call(m))
end
z = proc { |m| pp m[1] }
pp hello(z)
Upvotes: 2
Reputation: 20116
Why the output is different?
A proc in ruby has lexical scope. This means that when it finds a variable that is not defined, it is resolved within the context the proc was defined, not called. This explains the behavior of your code.
You can see the block is defined before the regexp, and this can cause confusion. The problem involves a magic ruby variable, and it works quite differently than other variables. Citing @JörgWMittag
It's rather simple, really: the reason why $SAFE doesn't behave like you would expect from a global variable is because it isn't a global variable. It's a magic unicorn thingamajiggy.
There are quite a few of those magic unicorn thingamajiggies in Ruby, and they are unfortunately not very well documented (not at all documented, in fact), as the developers of the alternative Ruby implementations found out the hard way. These thingamajiggies all behave differently and (seemingly) inconsistently, and pretty much the only two things they have in common is that they look like global variables but don't behave like them.
Some have local scope. Some have thread-local scope. Some magically change without anyone ever assigning to them. Some have magic meaning for the interpreter and change how the language behaves. Some have other weird semantics attached to them.
If you are really up to find exactly how the $1
and $2
variables work, I assume the only "documentation" you will find is rubyspec, that is a spec for ruby done the hard way by the Rubinus folks. Have a nice hacking, but be prepared for the pain.
Is there a way to pass a block to gsub from another context with $1, $2 variables setup the right way?
You can achieve what you want with this following modification (but I bet you already know that)
require 'pp'
def hello(z)
#z = proc {|m| pp $1}
"hello".gsub(/(o)/, &z)
end
z = proc {|m| pp m}
hello(z)
I'm not aware of a way to change the scope of a proc on the fly. But would you really want to do this?
Upvotes: 4