critic
critic

Reputation: 69

Parsing images via nokogiri and xpath

I currently have a piece of code which will grab a product title, description, and price and for that it works great. However, I also need it to get the image URL which is where my dilemma is. I tried using a xpath inside the loop I have at the bottom and it lists out ALL the images that are equal to 220 on EVERY product which I dont want at all. So basically I get something like this....

product 1 Title here
product 1 Description here
product 1 price here
http://www.test.com/product1.jpg
http://www.test.com/product2.jpg
http://www.test.com/product3.jpg
http://www.test.com/product4.jpg


product 2 Title here
product 2 Description here
product 2 price here
http://www.test.com/product1.jpg
http://www.test.com/product2.jpg
http://www.test.com/product3.jpg
http://www.test.com/product4.jpg

Where as I obviously want product 1 to just have http://www.test.com/product1.jpg and product 2 to have http://www.test.com/product2.jpg etc, etc. The images are just in a div tag with no class or ID hence why I didnt just easily put them into a css selector. Im really new to ruby/nokogiri so any help would be great.

require 'nokogiri'
require 'open-uri'


url = "http://thewebsitehere"

data = Nokogiri::HTML(open(url))

products = data.css('.item')



products.each do |product|
    puts product.at_css('.vproduct_list_title').text.strip
    puts product.at_css('.vproduct_list_descr').text.strip
    puts product.at_css('.price-value').text.strip
    puts product.xpath('//img[@width = 220]/@src').map {|a| a.value }

end

Upvotes: 0

Views: 1032

Answers (2)

Dave S.
Dave S.

Reputation: 6419

Try changing:

puts product.xpath('//img[@width = 220]/@src').map {|a| a.value }

to:

puts product.xpath('.//img[@width = 220]/@src').map {|a| a.value }

The point of the '.' there is to say you want all images that are children of the current node (e.g. so you're not peeking at product 2's images).

Upvotes: 2

Patrick Oscity
Patrick Oscity

Reputation: 54674

File#basename will return only the filename:

File.basename('http://www.test.com/product4.jpg')
#=> "product4.jpg"

So you probably want something like this:

puts product.xpath('//img[@width = 220]/@src').map {|a| File.basename(a.value) }

Upvotes: 0

Related Questions