Reputation: 514
I'm having some issues with matching one string to another if the string I'm testing for contains regex characters.
Background: I'm working on a script that migrates news articles from 2 legacy systems into one. In some cases, these stories are duplicated within the systems, so I'm running a script to check stored data an archive file (in html form) to see if the title of the current story matches anything in the archive.
#...(for each line)
line.match(title) then
return true
end
This generally works, except when I have a regex character in the title, for example:
<span class="title">$8.9 Million Grant for UC Center Focused on Occupational Safety and Health</span>
doesn't match
$8.9 Million Grant for UC Center Focused on Occupational Safety and Health
Here's some example output from irb to demonstrate
2.3.0 :012 > str = '<span class="title">$8.9 Million Grant for UC Center Focused on Occupational Safety and Health</span>'
2.3.0 :020 > str.match("$8.9 Million Grant for UC Center Focused on Occupational Safety and Health")
=> nil
2.3.0 :021 > str.match("\\$8.9 Million Grant for UC Center Focused on Occupational Safety and Health")
=> #<MatchData "$8.9 Million Grant for UC Center Focused on Occupational Safety and Health">
2.3.0 :022 > str.match("8.9 Million Grant for UC Center Focused on Occupational Safety and Health")
=> #<MatchData "8.9 Million Grant for UC Center Focused on Occupational Safety and Health">
2.3.0 :023 >
So I'm pretty sure the $
is the issue, and that the issue stems from it being a recursive regex character.
Ruby isn't my daily language, and I'm having some trouble figuring out where to look to see if there either a ruby method to do the match without relying on regex, or to treat the pattern literally, or to automatically escape potential regex special characters. Help is appreciated.
Upvotes: 1
Views: 61
Reputation: 1767
If you don't need the MatchData (such as where in the string the target text occurs), a much simpler solution would be to use String#include?
:
str.include?("$8.9 Million")
# => true
If you do need the location the match occurs, using String#index
is still simpler:
str.index("$8.9 Million")
# => 20
Upvotes: 2
Reputation: 3073
str.match(Regexp.new(Regexp.escape("$8.9 Million ...")))
=> #<MatchData "$8.9 Million Grant for UC Center Focused...
Upvotes: 1