XPath: how to get text from this and next tag?

Question

i have HTML like this:

Hello1
World1
Hello2
World2
Hello2
World2

So i need to get at the one time Hello1 with World1, Hello2 with World2 etc

UPDATE: I use Ruby Mechanize library

ezkl · Accepted Answer

The Ruby library "Mechanize" uses the Nokogiri parsing library, so you can call Nokogiri directly. One potential solution might look something like this:

require 'mechanize'
require 'pp'

html = "Hello1
World1
Hello2
World2
Hello2
World2"

results = []

Nokogiri::HTML(html).xpath("//h1").each do |header|
  p   = header.xpath("following-sibling::p[1]").text
  results << [header.text, p]
end

pp results

EDIT: This example was tested with Mechanize v2.0.1 which uses Nokogiri ~v1.4. I also tested directly against Nokogiri v1.5.0 without issue.

EDIT #2: This example answers a follow-up question to the original solution:

require 'nokogiri'
require 'pp'

html = <


abide by (something)




- to follow the rules of something

The cleaning staff must abide by the rules of the school.





able to breathe easily again




My friend was able to breathe easily again when his company did not go bankrupt.


HTML

doc = Nokogiri::HTML(html)

results = []

Nokogiri::HTML(html).xpath("//h1").each do |header|
  h1   = header.xpath("following-sibling::p/font/b").text
  results << h1
end

pp results

H1 tags with nested elements are invalid, so Nokogiri corrects the error during the parsing process. The process to get at the formerly nested elements is very similar to the original solution.

XPath: how to get text from this and next tag?

Answers (2)

Related Questions