How to Use XPath Loop Inside a Ruby Loop

Question

I'm trying to find elements in a document for removal. I am able to manually build a query using something like this:

article.xpath("//*[@*[contains(., 'popular')]]", "//*[@*[contains(., 'comments')]]", "//*[@*[contains(., 'social-share')]]").each do |node|
  node.remove
end

Using a variable is also working:

line = 'related'
article.xpath("//*[@*[contains(., '#{line}')]]").each do |node|
  node.remove
end

I'd like to add all the words to a separate file and loop that file. I've tried the following but it is not working (silent failure—no output).

file = 'stop_words.txt'
File.readlines(file).each do |line|
  article.xpath("//*[@*[contains(., '#{line}')]]").each do |node|
    node.remove
  end
end

The File.readlines(file).each loop is working fine. If I add puts line, it prints the list from stop_words.txt. Why is the article.xpath loop not working?

anothermh · Accepted Answer

Each "word" in your file includes a newline at the end:

$ rm ~/test

$ printf "foo
bar
baz" > ~/test

$ cat ~/test
foo
bar
baz

Now read it with Ruby:

words = File.readlines("#{Dir.home}/test")
=> ["foo
", "bar
", "baz"]

Note that the words have newlines, so when you do this:

article.xpath("//*[@*[contains(., '#{line}')]]")

You're really doing:

article.xpath("//*[@*[contains(., 'foo
')]]")

Your best bet is to use chomp: true with .readlines:

words = File.readlines("#{Dir.home}/test", chomp: true)
=> ["foo", "bar", "baz"]

Whether or not this actually solves the underlying problem, I can't say. But I can tell you for certain that this is a bug in your code that has to be resolved.

How to Use XPath Loop Inside a Ruby Loop

Answers (1)

Related Questions