6:[["$","$Le",null,{}],["$","div",null,{"className":"min-h-screen bg-gray-100 p-6","children":[["$","$Lf",null,{}],["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"QAPage\",\"mainEntity\":{\"@type\":\"Question\",\"name\":\"Parsing malformed HTML with Mechanize (Ruby)\",\"text\":\"

I'd like to process an HTTP response that has a lot of HTML in it but is not itself a valid HTML file.

\\n\\n

I'm aware that I could use Nokogiri as follows: page = Nokogiri::HTML.parse(page.body), however, I'd like to have access to the Mechanize methods like Mechanize::Page.search. Is there any way to work with this HTML as a Mechanize::Page, or through some other Mechanize class?

\\n\",\"author\":{\"@type\":\"Person\",\"name\":\"Ben G\"},\"upvoteCount\":0,\"answerCount\":1,\"acceptedAnswer\":null}}"}}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mb-6 relative","children":[["$","div",null,{"className":"absolute top-4 right-4 flex flex-wrap space-x-2","children":[["$","span","ruby",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/ruby/1","children":"ruby"}]}],["$","span","web-scraping",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/web-scraping/1","children":"web-scraping"}]}],["$","span","mechanize",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/mechanize/1","children":"mechanize"}]}]]}],["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/dc643a4278351110aac11e097854161f?s=256&d=identicon&r=PG","alt":"Ben G","className":"w-16 h-16 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/378622/ben-g","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"Ben G"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",26771]}]]}]]}],["$","h1",null,{"className":"text-2xl font-bold text-gray-800 mb-4","children":"Parsing malformed HTML with Mechanize (Ruby)"}],["$","p",null,{"className":"text-gray-700 mt-4","dangerouslySetInnerHTML":{"__html":"

I'd like to process an HTTP response that has a lot of HTML in it but is not itself a valid HTML file.

\n\n

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm mt-4","children":[["$","p",null,{"children":["Upvotes: ",0]}],["$","p",null,{"children":["Views: ",263]}]]}]]}],["$","div",null,{"className":"container mx-auto","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-6","children":["Answers (",1,")"]}],[["$","div","11979473",{"className":"bg-white shadow-md rounded-lg p-6 mb-6","children":[["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/dc643a4278351110aac11e097854161f?s=256&d=identicon&r=PG","alt":"Ben G","className":"w-12 h-12 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/378622/ben-g","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"Ben G"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",26771]}]]}]]}],["$","p",null,{"className":"text-gray-700 mb-4","dangerouslySetInnerHTML":{"__html":"

Actually, looks like I've found the answer to my own question:

\n\n

 page  = Mechanize::Page.new(URI.parse('http://example.com'), {'content-type'=>'text/html'},(page.body), 200, agent)\n

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm","children":["$","p",null,{"children":["Upvotes: ",2]}]}]]}]]]}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mt-6","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-4","children":"Related Questions"}],["$","ul",null,{"className":"list-disc list-inside","children":[["$","li","74930443",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/74930443","className":"text-blue-600 hover:underline","children":"Web-scraping with Ruby Mechanize"}]}],["$","li","6287845",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/6287845","className":"text-blue-600 hover:underline","children":"How to parse malformed HTML with Ruby and Mechanize"}]}],["$","li","31271909",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/31271909","className":"text-blue-600 hover:underline","children":"How do I read the content of every HTML tag using Mechanize?"}]}],["$","li","9644183",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/9644183","className":"text-blue-600 hover:underline","children":"Parse html page with mechanize to receive the appropriate array"}]}],["$","li","28269391",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/28269391","className":"text-blue-600 hover:underline","children":"Scraping with Mechanize"}]}],["$","li","12574428",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/12574428","className":"text-blue-600 hover:underline","children":"Ruby Mechanize Parse Meta Tags"}]}],["$","li","20386209",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/20386209","className":"text-blue-600 hover:underline","children":"Web scraping Mechanize in Ruby producing different html to browser"}]}],["$","li","15906013",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/15906013","className":"text-blue-600 hover:underline","children":"How do I retrieve HTML for content?"}]}],["$","li","9631891",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/9631891","className":"text-blue-600 hover:underline","children":"How to parse this html page using mechanize (ruby gem)? *small amendment*"}]}],["$","li","903202",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/903202","className":"text-blue-600 hover:underline","children":"mechanize html scraping problem"}]}]]}]]}]]}],["$","$L11",null,{}],["$","$L12",null,{}],["$","$L13",null,{}],["$","$L14",null,{}],["$","$L15",null,{}]]

Parsing malformed HTML with Mechanize (Ruby)

Answers (1)

Related Questions