Brad West
Brad West

Reputation: 969

How to Use XPath Loop Inside a Ruby Loop

I'm trying to find elements in a document for removal. I am able to manually build a query using something like this:

article.xpath("//*[@*[contains(., 'popular')]]", "//*[@*[contains(., 'comments')]]", "//*[@*[contains(., 'social-share')]]").each do |node|
  node.remove
end

Using a variable is also working:

line = 'related'
article.xpath("//*[@*[contains(., '#{line}')]]").each do |node|
  node.remove
end

I'd like to add all the words to a separate file and loop that file. I've tried the following but it is not working (silent failure—no output).

file = 'stop_words.txt'
File.readlines(file).each do |line|
  article.xpath("//*[@*[contains(., '#{line}')]]").each do |node|
    node.remove
  end
end

The File.readlines(file).each loop is working fine. If I add puts line, it prints the list from stop_words.txt. Why is the article.xpath loop not working?

Upvotes: 1

Views: 43

Answers (1)

anothermh
anothermh

Reputation: 10526

Each "word" in your file includes a newline at the end:

$ rm ~/test

$ printf "foo\nbar\nbaz" > ~/test

$ cat ~/test
foo
bar
baz

Now read it with Ruby:

words = File.readlines("#{Dir.home}/test")
=> ["foo\n", "bar\n", "baz"]

Note that the words have newlines, so when you do this:

article.xpath("//*[@*[contains(., '#{line}')]]")

You're really doing:

article.xpath("//*[@*[contains(., 'foo\n')]]")

Your best bet is to use chomp: true with .readlines:

words = File.readlines("#{Dir.home}/test", chomp: true)
=> ["foo", "bar", "baz"]

Whether or not this actually solves the underlying problem, I can't say. But I can tell you for certain that this is a bug in your code that has to be resolved.

Upvotes: 2

Related Questions