Reputation: 217
Seeming as i didn't ask this very well the first time. Heres another go.
I'm trying to follow this tutorial here: http://railscasts.com/episodes/190-screen-scraping-with-nokogiri
I'm currently also trying to scrape the price from this website link here: http://www.ticketmaster.co.uk/derren-brown-miracle-glasgow-04-07-2016/event/370050789149169E?artistid=1408737&majorcatid=10002&minorcatid=53&tpab=-1
What i'm wanting to achieve is to have all three of these ticket (name and price hopefully as much information about the tickets/prices as possible) and use them in my web application.
I can't show you the result, Its stupidly big in size, But i can tell you that i don't hit the second byebug, Heres my code.
url = "http://www.ticketmaster.co.uk/derren-brown-miracle-glasgow-04-07-2016/event/370050789149169E?artistid=1408737&majorcatid=10002&minorcatid=53&tpab=-1"
doc = Nokogiri::HTML(open(url))
byebug
doc.css(".item").each do |item|
title = item.at_css(".fru").text
byebug
end
Unfortunately to help you'll ideally have to try this yourself to see the horrible page size! haha!
Edit, Ok baring in mind my screen is 27 inches, The text FILLS the screen
Heres an image of what i got in from the first image.
Further to this i believe that this image here is all i need? its just getting it out.
Thanks Sam
Upvotes: 1
Views: 66
Reputation: 577
The main issue here is that the price is written inside javascript and not html itslef. Nokogiri only parse XML and HTML, therefore you need help of awesome REGEX. Before you read the full code, here a few tips to undestand it.
First I search for all tags named <script>
by using this code:
doc.xpath("//script[@type='text/javascript']/text()").each
It returns more than 100 objects, so I needed to find in which of them I can find the name and the price. Therefore I found out that specific javascript I needed to read had some unique text in it, so I looped through all >100 objects and tested if it contains that unique string. Here are the images for you to understand:
And when I found those peaces, I just used REGEX to extract the price and name. Here is the working code. Just copy paste and run it.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
def get_name_and_price
ticketmaster_url = "http://www.ticketmaster.co.uk/derren-brown-miracle-glasgow-04-07-2016/event/370050789149169E?artistid=1408737&majorcatid=10002&minorcatid=53&tpab=-1"
doc = Nokogiri::HTML(open( ticketmaster_url ))
event_name = nil
ticket_price = nil
doc.xpath("//script[@type='text/javascript']/text()").each do |text|
if text.content =~ /TM\.Tracking\.satellite/
event_name = text.content[/"eventName":".*?"/].gsub!('"eventName":', '').gsub!('"', '')
elsif text.content =~ /more_options_on_polling/
ticket_price = text.content[/"total_price":"\d+\.\d+"/].gsub!('"total_price":', '').gsub!('"', '').to_f
end
end
puts "Event name: " + event_name
puts "Ticket price: " + ticket_price.to_s
end
get_name_and_price
Upvotes: 1