Reputation: 969
I'm trying to find elements in a document for removal. I am able to manually build a query using something like this:
article.xpath("//*[@*[contains(., 'popular')]]", "//*[@*[contains(., 'comments')]]", "//*[@*[contains(., 'social-share')]]").each do |node|
node.remove
end
Using a variable is also working:
line = 'related'
article.xpath("//*[@*[contains(., '#{line}')]]").each do |node|
node.remove
end
I'd like to add all the words to a separate file and loop that file. I've tried the following but it is not working (silent failure—no output).
file = 'stop_words.txt'
File.readlines(file).each do |line|
article.xpath("//*[@*[contains(., '#{line}')]]").each do |node|
node.remove
end
end
The File.readlines(file).each
loop is working fine. If I add puts line
, it prints the list from stop_words.txt. Why is the article.xpath
loop not working?
Upvotes: 1
Views: 43
Reputation: 10526
Each "word" in your file includes a newline at the end:
$ rm ~/test
$ printf "foo\nbar\nbaz" > ~/test
$ cat ~/test
foo
bar
baz
Now read it with Ruby:
words = File.readlines("#{Dir.home}/test")
=> ["foo\n", "bar\n", "baz"]
Note that the words have newlines, so when you do this:
article.xpath("//*[@*[contains(., '#{line}')]]")
You're really doing:
article.xpath("//*[@*[contains(., 'foo\n')]]")
Your best bet is to use chomp: true
with .readlines
:
words = File.readlines("#{Dir.home}/test", chomp: true)
=> ["foo", "bar", "baz"]
Whether or not this actually solves the underlying problem, I can't say. But I can tell you for certain that this is a bug in your code that has to be resolved.
Upvotes: 2