Reputation: 3052
My Ruby/Nokogiri script is:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
f = File.new("enterret" + ".txt", 'w')
1.upto(100) do |page|
urltext = "http://xxxxxxx.com/" + "page/"
urltext << page.to_s + "/"
doc = Nokogiri::HTML(open(urltext))
doc.css(".photoPost").each do |post|
quote = post.css("h1 + p").text
author = post.css("h1 + p + p").text
f.puts "#{quote}" + "#{author}"
f.puts "--------------------------------------------------------"
end
end
When running this script i get the following error:
http.rb:2030:in `read_status_line': wrong status line: "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"" (Net::HTTPBadResponse)
However my script writes to file correctly, it just that this error keeps coming up. What does the error mean?
Upvotes: 0
Views: 4079
Reputation: 160549
Without knowing what site you are accessing it is hard to say for sure, but I suspect that the problem isn't in Nokogiri.
The error is being reported by http.rb
, which would most likely be complaining about the HTTPd headers being returned. http.rb
is concerned with the handshake with the HTTPd server and would whine about missing/malformed headers, but it wouldn't care about the payload.
Nokogiri, on the other hand, would be concerned about the payload, i.e., the HTML. The DOCTYPE is supposed to be part of the HTML payload, so I suspect their server is sending a HTML DOCTYPE instead of a MIME doctype, which should be "text/html"
.
In the Ruby 1.8.7 http.rb file you'll see the following lines at 2030 in the code:
def response_class(code)
CODE_TO_OBJ[code] or
CODE_CLASS_TO_OBJ[code[0,1]] or
HTTPUnknownResponse
end
That seems a likely place to generate the sort of message you're seeing.
Upvotes: 2