Reputation: 16821
As you all know, external resources, like images, can be embedded into the html file using base64 encoding:
<img src="data:image/png;base64,iVBORw0KGgoAAAANS..." />
I'm looking for a pure browser-based javascript way to traverse an html page and embed all the external resources into the file so when I say $("html").html()
, it returns all the page's contents. Even including its external resources.
Just so it makes sense, I'm trying to download web pages into single files using a headless browser on my server.
Upvotes: 15
Views: 8342
Reputation: 31666
There are Save Page WE extension for Firefox and Chrome:
This extension can scroll or zoom out the page in order to allow fetching lazy-loading resources before saving.
monolith
(rust)CLI tool for saving complete web pages as a single HTML file
Install
# any platform with rustc installed
cargo install monolith
# on macos
brew install monolith
# on windows
choco install monolith
obelisk
(golang)Go package and CLI tool for saving web page as single HTML file
# any platform with go sdk installed
go install -v github.com/go-shiori/obelisk/cmd/obelisk@latest
binaries: https://github.com/go-shiori/obelisk/releases
inliner
inliner is a npm module which exposes the inliner
cli utility; works with some URLs but throws errors with others. Pipes output to stdout and therefore needs to be used like e.g. inliner https://http.cat > cats.html
.
It can be installed with (assuming you have nodejs+npm):
npm install -g inliner
Upvotes: 1
Reputation: 10006
There are tools out there to do that. Examples:
While there are benefits to this approach, remember that a page visited more than once, or site with multiple pages with same JS/CSS files will enjoy client (browser) side caching.
Upvotes: 13