sam roberts
sam roberts

Reputation: 217

Nokugiri web scrape issue

Seeming as i didn't ask this very well the first time. Heres another go.

I'm trying to follow this tutorial here: http://railscasts.com/episodes/190-screen-scraping-with-nokogiri

I'm currently also trying to scrape the price from this website link here: http://www.ticketmaster.co.uk/derren-brown-miracle-glasgow-04-07-2016/event/370050789149169E?artistid=1408737&majorcatid=10002&minorcatid=53&tpab=-1

What i'm wanting to achieve is to have all three of these ticket (name and price hopefully as much information about the tickets/prices as possible) and use them in my web application.

I can't show you the result, Its stupidly big in size, But i can tell you that i don't hit the second byebug, Heres my code.

  url = "http://www.ticketmaster.co.uk/derren-brown-miracle-glasgow-04-07-2016/event/370050789149169E?artistid=1408737&majorcatid=10002&minorcatid=53&tpab=-1"
    doc = Nokogiri::HTML(open(url))
    byebug
    doc.css(".item").each do |item|
      title = item.at_css(".fru").text
      byebug
    end

Unfortunately to help you'll ideally have to try this yourself to see the horrible page size! haha!

Edit, Ok baring in mind my screen is 27 inches, The text FILLS the screen

Heres an image of what i got in from the first image.

huge image of response

Further to this i believe that this image here is all i need? its just getting it out.

what i might need!

Thanks Sam

Upvotes: 1

Views: 66

Answers (1)

Volodymyr Balytskyy
Volodymyr Balytskyy

Reputation: 577

The main issue here is that the price is written inside javascript and not html itslef. Nokogiri only parse XML and HTML, therefore you need help of awesome REGEX. Before you read the full code, here a few tips to undestand it.

First I search for all tags named <script> by using this code:

doc.xpath("//script[@type='text/javascript']/text()").each

It returns more than 100 objects, so I needed to find in which of them I can find the name and the price. Therefore I found out that specific javascript I needed to read had some unique text in it, so I looped through all >100 objects and tested if it contains that unique string. Here are the images for you to understand:

enter image description here

enter image description here

And when I found those peaces, I just used REGEX to extract the price and name. Here is the working code. Just copy paste and run it.

require 'rubygems'
require 'nokogiri'
require 'open-uri'

def get_name_and_price
  ticketmaster_url = "http://www.ticketmaster.co.uk/derren-brown-miracle-glasgow-04-07-2016/event/370050789149169E?artistid=1408737&majorcatid=10002&minorcatid=53&tpab=-1"
  doc = Nokogiri::HTML(open( ticketmaster_url ))
  event_name = nil
  ticket_price = nil
  doc.xpath("//script[@type='text/javascript']/text()").each do |text|
    if text.content =~ /TM\.Tracking\.satellite/
      event_name = text.content[/"eventName":".*?"/].gsub!('"eventName":', '').gsub!('"', '')
    elsif text.content =~ /more_options_on_polling/
      ticket_price = text.content[/"total_price":"\d+\.\d+"/].gsub!('"total_price":', '').gsub!('"', '').to_f
    end
  end

  puts "Event name: " + event_name
  puts "Ticket price: " + ticket_price.to_s
end

get_name_and_price

Upvotes: 1

Related Questions