Reputation:

Hpricot search how to

I would like to do a search in a webpage if I have result than I need a property. Here is the webpage: link text

I am interested if, the header the meta has the property with value "og:title" ot nor, if has I want the content value

If we look at the source of the page, it has a potion of:

<meta
property="og:title" content="Explore the Titanic Wreck Site via Social Media [EXCLUSIVE]" />

so I want a true result for og:title query and a Explore the Titanic Wreck Site via Social Media [EXCLUSIVE] value for next search, how to do it properly

search("/html/head/meta[(@property='og:title']") doesn't return what I want.

any suggestion?

Upvotes: 0

Answers (3)

user529543

Reputation:

Thanks for answers. When I posted my question I couldn't realize I have a mistake in the search. It was Friday evening...

The correct search is

elements = @doc.search("/html/head/meta[@property='og:title']")

it is removed a ( character from expression before @property

This give the:

elements = <meta property="og:title" content="Explore the Titanic Wreck Site via Social Media [EXCLUSIVE]" />

result. Than I am checking if I have something or not, if I have, than I need the content value

if elements.nil?
   puts 'not found'
  elsif elements.size > 0
    puts "Found one, og:title = #{elements}" 
    content = elements.attr("content");
    puts content # this will display the content ( it will be processed)
  else
    ... can come here the flow control? - theoretically yes, but in practice?
  end

Upvotes: 1

the Tin Man

Reputation: 160551

Your XPath has an error in it, plus is too restrictive:

search("/html/head/meta[(@property='og:title']")

should be:

search("/html/head/meta[@property='og:title']")

to fix the error. I'd simplify it to:

search("//meta[@property='og:title']")

Also, it's not quite clear what you want to do. Do you want to find

<meta 
  property="og:title" 
  content="Explore the Titanic Wreck Site via Social Media [EXCLUSIVE]" 
 />

and extract the content parameter? Or do you want to locate the tag, confirm it contains both the "og:title" property tag and the "Explore the Titanic Wreck Site via Social Media [EXCLUSIVE]" content, and then do further processing?

That said, often it's simpler to use CSS accessors instead of XPath. I prefer using Nokogiri, which has both XPath and CSS selectors; I'm using CSS below:

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('http://mashable.com/2010/08/06/expedition-titanic'))
(doc % 'meta[property="og:title"]')
=> #<Nokogiri::XML::Element:0x8084ee48 name="meta" attributes=[#<Nokogiri::XML::Attr:0x8084ed58 name="property" value="og:title">, #<Nokogiri::XML::Attr:0x8084ed1c name="content" value="Explore the Titanic Wreck Site via Social Media [EXCLUSIVE]">]>

Nokogiri and Hpricot support the / and % shorthand for search and at respectively. "Search" returns an array of all matches, and "at" returns only the first match. So, the example above gets the first node using the CSS, showing this is the right track. I'm not sure how to use CSS to match two parameters in the same tag, so I'll go after all <meta> tags with property="og:title", then filter based on the content= parameter:

(doc / 'meta[property="og:title"]').select{ |n| n['content'][/titanic wreck site/i] }
=> [#<Nokogiri::XML::Element:0x8084ee48 name="meta" attributes=[#<Nokogiri::XML::Attr:0x8084ed58 name="property" value="og:title">, #<Nokogiri::XML::Attr:0x8084ed1c name="content" value="Explore the Titanic Wreck Site via Social Media [EXCLUSIVE]">]>]

At that point we've got the right node in the returned array, so you can extract whatever you want, or dive into its children and sack and pillage. To do that you'll want to use .first or [0] to get at the actual node for further processing:

(doc / 'meta[property="og:title"]').select{ |n| n['content'][/titanic wreck site/i] }.first

Update based on OP's response, using Nokogiri still:

>> meta = (doc % 'meta[@property="og:title"]')['content']
>> meta #=> "Explore the Titanic Wreck Site via Social Media [EXCLUSIVE]"

Upvotes: 1

user357812

Reputation:

Use:

/html/head/meta[@property='og:title']/@content

Upvotes: 2

Hpricot search how to

Answers (3)

Related Questions