Reputation: 7087
I am downloading part of an HTML page by:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('https://example.com/index.html'))
wiki = doc./('//*[@id="wiki"]/div[1]')
and I need the stylesheets in order to display it correctly. They are included in the header like so:
<!DOCTYPE html>
<html lang="en" class="">
<head>
...
<link href="https://example.com/9f40a.css" media="all" rel="stylesheet" />
<link href="https://example.com/4e5fb.css" media="all" rel="stylesheet" />
...
</head>
...
and their naming can be changed. How do I parse/download local copies of the stylesheets?
Upvotes: 3
Views: 404
Reputation: 8258
Something like this:
require 'open-uri'
doc.css("head link").each do |tag|
link = tag["href"]
next unless link && link.end_with?("css")
File.open("/tmp/#{File.basename(link)}", "w") do |f|
content = open(link) { |g| g.read }
f.write(content)
end
end
Upvotes: 4
Reputation: 9865
I'm not a ruby expert but you can go over following steps
.scan(...)
method provided with String
type to parse and get the .css
file names. The scan
method will return you an array stylesheet file names. Find more info on scan
hereNet::HTTP.get(...)
an example is here Upvotes: 1