MARKUS M
MARKUS M

Reputation: 11

Nokogiri help without spaces

i have the following code:

#/usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'cora'
require 'eat'
#require 'timeout'

doc = Nokogiri::HTML(open("http://mobile.bahn.de/bin/mobil/bhftafel.exe/dox?input=Richard-Strauss-Stra%DFe%2C+M%FCnchen%23625127&date=27.01.12&time=20%3A41&productsFilter=1111111111000000&REQTrain_name=&maxJourneys=10&start=Suchen&boardType=Abfahrt&ao=yes"))
doc = doc.xpath('//div').each do |node|
  puts node.content
end

How can i remove the p-tags and spaces?

Upvotes: 1

Views: 416

Answers (1)

Phrogz
Phrogz

Reputation: 303205

Here's a guess at what you might want:

require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("http://mobile.bahn.de/bin/mobil/bhftafel.exe/dox?input=Richard-Strauss-Stra%DFe%2C+M%FCnchen%23625127&date=27.01.12&time=20%3A41&productsFilter=1111111111000000&REQTrain_name=&maxJourneys=10&start=Suchen&boardType=Abfahrt&ao=yes"))
doc.xpath('//div//p').remove
doc = doc.xpath('//div').each do |node|
  text = node.text.gsub(/\n([ \t]*\n)+/,"\n").gsub(/^\s+|\s+$/,'')
  puts text unless text.empty?
end

This removes all <p> elements from the document and then removes all blank lines and leading and trailing whitespace from the text. In the end, it does not print the text if the result was an empty string.

Edit: To make a variable for the date, wrap the above in a function and use string interpolation to construct your URL. For example:

require 'nokogiri'
require 'open-uri'
def get_data( date )
  date_string = date.strftime('%d-%m-%y')
  url = "http://mobilde.bahn.de/…more…#{date_string}…more…"
  doc = Nokogiri::HTML(open(url))
  # more code from above
end

Upvotes: 1

Related Questions