Mehran
Mehran

Reputation: 16821

Embedding all the external resources of an HTML page into a single file using javascript in the browser

As you all know, external resources, like images, can be embedded into the html file using base64 encoding:

<img src="..." />

I'm looking for a pure browser-based javascript way to traverse an html page and embed all the external resources into the file so when I say $("html").html(), it returns all the page's contents. Even including its external resources.

Just so it makes sense, I'm trying to download web pages into single files using a headless browser on my server.

Upvotes: 15

Views: 8342

Answers (2)

ccpizza
ccpizza

Reputation: 31666

Browser extensions

There are Save Page WE extension for Firefox and Chrome:

This extension can scroll or zoom out the page in order to allow fetching lazy-loading resources before saving.

Command line tools

monolith (rust)

CLI tool for saving complete web pages as a single HTML file

Install

# any platform with rustc installed
cargo install monolith

# on macos
brew install monolith

# on windows
choco install monolith

obelisk (golang)

Go package and CLI tool for saving web page as single HTML file

# any platform with go sdk installed
go install -v github.com/go-shiori/obelisk/cmd/obelisk@latest

binaries: https://github.com/go-shiori/obelisk/releases

inliner

inliner is a npm module which exposes the inliner cli utility; works with some URLs but throws errors with others. Pipes output to stdout and therefore needs to be used like e.g. inliner https://http.cat > cats.html.

It can be installed with (assuming you have nodejs+npm):

npm install -g inliner

Upvotes: 1

JAR.JAR.beans
JAR.JAR.beans

Reputation: 10006

There are tools out there to do that. Examples:

While there are benefits to this approach, remember that a page visited more than once, or site with multiple pages with same JS/CSS files will enjoy client (browser) side caching.

Upvotes: 13

Related Questions