Jasmine Lognnes
Jasmine Lognnes

Reputation: 7087

How to parse/download stylesheet from HTML

I am downloading part of an HTML page by:

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('https://example.com/index.html'))
wiki = doc./('//*[@id="wiki"]/div[1]')

and I need the stylesheets in order to display it correctly. They are included in the header like so:

<!DOCTYPE html>
<html lang="en" class="">
    <head>
    ...
    <link href="https://example.com/9f40a.css" media="all" rel="stylesheet" />
    <link href="https://example.com/4e5fb.css" media="all" rel="stylesheet" />
    ...
  </head>
  ...

and their naming can be changed. How do I parse/download local copies of the stylesheets?

Upvotes: 3

Views: 404

Answers (2)

mrbrdo
mrbrdo

Reputation: 8258

Something like this:

require 'open-uri'
doc.css("head link").each do |tag|
  link = tag["href"]
  next unless link && link.end_with?("css")
  File.open("/tmp/#{File.basename(link)}", "w") do |f|
    content = open(link) { |g| g.read }
    f.write(content)
  end
end

Upvotes: 4

deimus
deimus

Reputation: 9865

I'm not a ruby expert but you can go over following steps

  • You can use .scan(...) method provided with String type to parse and get the .css file names. The scan method will return you an array stylesheet file names. Find more info on scan here
  • Then download and store the files with Net::HTTP.get(...) an example is here

Upvotes: 1

Related Questions