how to get attribute values using nokogiri

Question

I have a webpage whose DOM structure I do not know...but i know the text which i need to find in that particular webpage..so in order to get its xpath what i do is :

doc = Nokogiri::HTML(webpage)
doc.traverse { |node|
  if node.text?
    if node.content == "my text"
      path << node.path
    end
  end
}
puts path

now suppose i get an output like ::

   html/body/div[4]/div[8]/div/div[38]/div/p/text()

so that later on when i access this webpage again i can do this ::

    doc.xpath("#{path[0]}")

instead of traversing the whole DOM tree everytime i want the text

I want to do some further processing , for that i need to know which of the element nodes in the above xpath output have attributes associated with them and what are their attribute values. how would i achieve that? the output that i want is

    #=> output desired
{ p => p_attr_value , div => div_attr_value , div[38] => div[38]_attr_value.....so on }

I am not facing the problem in searching the nodes where "my text" lies.. I wanted to have the full xpath of "my text" node..thts why i did the whole traversal...now after finding the full xpath i want the attributes associated with the each element node that I came across while getting to the "my text" node

constraints are ::I cant use any of the developer tools available in a web browser

PS :: I am newbie in ruby and nokogiri..

Dimitre Novatchev · Accepted Answer

To select all attributes of an element that is selected using the XPath expression someExpr, you need to evaluate a new XPath expression:

someExpr/@*

where someExpr must be substituted with the real XPath expression used to select the particular element.

This selects all attributes of all (we assume that's just one) elements that are selected by the Xpath expression someExpr

For example, if the element we want is selected by:

/a/b/c

then all of its attributes are selected by:

/a/b/c/@*

how to get attribute values using nokogiri

Answers (1)

Related Questions