bstockwell
bstockwell

Reputation: 514

Match a substring that might contain reserved characters

I'm having some issues with matching one string to another if the string I'm testing for contains regex characters.

Background: I'm working on a script that migrates news articles from 2 legacy systems into one. In some cases, these stories are duplicated within the systems, so I'm running a script to check stored data an archive file (in html form) to see if the title of the current story matches anything in the archive.

#...(for each line) 
line.match(title) then
    return true
end

This generally works, except when I have a regex character in the title, for example:

<span class="title">$8.9 Million Grant for UC Center Focused on Occupational Safety and Health</span>

doesn't match

$8.9 Million Grant for UC Center Focused on Occupational Safety and Health

Here's some example output from irb to demonstrate

2.3.0 :012 > str = '<span class="title">$8.9 Million Grant for UC Center Focused on Occupational Safety and Health</span>'
2.3.0 :020 > str.match("$8.9 Million Grant for UC Center Focused on Occupational Safety and Health")
 => nil 
2.3.0 :021 > str.match("\\$8.9 Million Grant for UC Center Focused on Occupational Safety and Health")
 => #<MatchData "$8.9 Million Grant for UC Center Focused on Occupational Safety and Health"> 
2.3.0 :022 > str.match("8.9 Million Grant for UC Center Focused on Occupational Safety and Health")
 => #<MatchData "8.9 Million Grant for UC Center Focused on Occupational Safety and Health"> 
2.3.0 :023 > 

So I'm pretty sure the $ is the issue, and that the issue stems from it being a recursive regex character.

Ruby isn't my daily language, and I'm having some trouble figuring out where to look to see if there either a ruby method to do the match without relying on regex, or to treat the pattern literally, or to automatically escape potential regex special characters. Help is appreciated.

Upvotes: 1

Views: 61

Answers (2)

philomory
philomory

Reputation: 1767

If you don't need the MatchData (such as where in the string the target text occurs), a much simpler solution would be to use String#include?:

str.include?("$8.9 Million")
# => true

If you do need the location the match occurs, using String#index is still simpler:

str.index("$8.9 Million")
# => 20

Upvotes: 2

Timo Schilling
Timo Schilling

Reputation: 3073

str.match(Regexp.new(Regexp.escape("$8.9 Million ...")))
=> #<MatchData "$8.9 Million Grant for UC Center Focused...

Upvotes: 1

Related Questions