Reputation: 13
I am trying to fetch different URLs e.g. site.com/page=1, page2 and so on. All fetched data should be stored in an HTML file to read it with Nokogiri.
If I only read one URL and write it into a file, it is working perfect. When I extended the script to read all possible URLs, it isn't working.
def getData
@a=1
array = Array.new
while @a<5 do
uri = URI.parse("https://exampel.com?pageNr="[email protected]_s+"Size=10")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new(uri.request_uri)
puts "Fetching data from "+uri.request_uri
#puts @cookie
request['Cookie']=@cookie
response = http.request(request)
if response != nil
array[@a]=response.body
@a+=1
end
end
File.write('output.html',array)
end
Upvotes: 1
Views: 44
Reputation: 106882
There is no need to write a file, you can pass the response.body
directly to Nokogiri
:
def get_data
(1..5).each do |i|
uri = URI.parse("https://exampel.com?pageNr=#{i}&Size=10")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
puts "Fetching data from: #{uri.request_uri}"
request = Net::HTTP::Get.new(uri.request_uri)
request['Cookie'] = @cookie
response = http.request(request)
if response
puts "processing document..."
document = Nokogiri::HTML(response.body)
# process the document
end
end
end
See: Nokogiri Tutorial: How to parse a document
Upvotes: 1