Reputation: 1937
This is my code
stopwordlist = "a|an|all"
File.open('0_9.txt').each do |line|
line.downcase!
line.gsub!( /\b#{stopwordlist}\b/,'')
File.open('0_9_2.txt', 'w') { |f| f.write(line) }
end
I wanted to remove words - a,an and all But, instead it matches substrings also and removes them
For an example input -
Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life
I get the output -
bromwell high is cartoon comedy. it r t the same time s some other programs bout school life
As you can see, it matched the substring.
How do I make it just match the word and not substrings ?
Upvotes: 3
Views: 3358
Reputation: 168269
The |
operator in regex takes the widest scope possible. Your original regex matches either \ba
or an
or all\b
.
Change the whole regex to:
/\b(?:#{stopwordlist})\b/
or change stopwordlist
into a regex instead of a string.
stopwordlist = /a|an|all/
Even better, you may want to use Regexp.union
.
Upvotes: 7