Reputation: 167
I am currently trying to scrape data from a webpage using Nokogiri. I want to scrape data for the list of service centers from the link http://www.cardekho.com/Maruti/Noida/car-service-center.htm
The code I have written for same is:
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open("http://www.cardekho.com/Maruti/Noida/car-service-center.htm"))
doc.css('.delrname').each do |node|
puts node.text
end
I have tried a bunch of combination of CSS tags but none of them is giving the desired result.Can anybody suggest the tag that will correctly scrape the data for list of service centers from this link ?
Thanks in advance
PS: The same code(with appropriate CSS tag) when I tested on other websites is working as expected, but it is not working on this website.
Upvotes: 0
Views: 103
Reputation: 1365
Optionally, you can use Regular Expressions to get more detailed result... for example, using:
/(<div class="delrname">([^<]*)<\/div><p>([^<]*)<\/p><div><div class="delermobcol "><div class="clearfix"><span class="mobico sprite"><\/span><div class="mobno">([^<]*)<\/div><\/div><div class="clear"><\/div><div class="viewsercntr"><a href="([^"]*)" title="View Car Dealers for Maruti in Noida">View Car Dealers for Maruti in Noida<\/a><\/div><\/div><div class="delermoilcol"><!----><div class="clearfix"><span class="mailico sprite"><\/span><div class="mobno"><a href="mailto:([^"]*)" target="_top">[email protected]<\/a><\/div>)/
You can break out results such as:
arrMatches = doc.scan(/(<div class="delrname">([^<]*)<\/div><p>([^<]*)<\/p><div><div class="delermobcol "><div class="clearfix"><span class="mobico sprite"><\/span><div class="mobno">([^<]*)<\/div><\/div><div class="clear"><\/div><div class="viewsercntr"><a href="([^"]*)" title="View Car Dealers for Maruti in Noida">View Car Dealers for Maruti in Noida<\/a><\/div><\/div><div class="delermoilcol"><!----><div class="clearfix"><span class="mailico sprite"><\/span><div class="mobno"><a href="mailto:([^"]*)" target="_top">[email protected]<\/a><\/div>)/)
arrMatches.each do |dealerInfo|
thisEntireMatch = dealerInfo[0]
thisName = dealerInfo[1]
thisAddress = dealerInfo[2]
thisMobile = dealerInfo[3]
thisLink = dealerInfo[4]
thisEmail = dealerInfo[5]
end
Upvotes: 0
Reputation: 2584
Your code seems work. I have removed the white spaces in the url:
doc = Nokogiri::HTML(open("http://www.cardekho.com/Maruti/Noida/car-service-center.htm"))
then I have try it and this is the output:
$ ruby file.rb Fast Track Auto Care India
Jkm Motors
Mangalam Motors
Motorcraft India
Motorcraft India
Rohan Motors
Rohan Motors
Rohan Motors
Vipul Motors
Upvotes: 2