user1785898
user1785898

Reputation: 167

Convert webarchive to html

I managed to collect the behavior of a complex web site into a webarchive. Thereafter I would like to turn that webarchive into an html set of nested directory. Yet, when I did it both with Waf and with a commercial software bought on the the Apple store, what I get is just the nested directory with the html page at the bottom and no images, nor css nor working links. If you are interested the webarchive document is at:

http://www.miafoto.it/it/GiroMilano.webarchive

while the weak product of the extraction is at:

http://www.miafoto.it/it/Giromilano/Pagine/default.aspx

and the empty directories above. In addition to the different look, the webarchive displays the same behavior as the official web site - when a listbox vales is selected and then the button pushed - while the extracted version produces a page with no contents by loading itself rather than the official page. As you may see the webarchive is over 1MB while the extraction just little over 1 KB.

What is wrong with it and how may I perform such an apparently trivial business with usable results?

Thanks,

Upvotes: 13

Views: 23083

Answers (4)

user2407486
user2407486

Reputation: 49

I find that this WebArchiveExtractor.app works on my Mac (Mojave OS) – https://robrohan.github.io/WebArchiveExtractor/

Upvotes: 2

Fariman Kashani
Fariman Kashani

Reputation: 1024

To save HTML pages on mac, I use chrome. Download and install it and save your page as HTML. Safari will save the web pages with webarchiveformat and for me, it's very hard to deal with it.

Upvotes: 0

alexkovelsky
alexkovelsky

Reputation: 4188

textutil -convert html example.webarchive
  • Be careful — html with files is created in the same folder as webarchive!
  • Also, I had to open .html with text editor and replace "file:///image.tiff" links (replace "file:///" with "") so they point to relative path.
  • Also, not all browsers display .tiff images.

Who knew we have Stack Overflow wiki?

Upvotes: 10

user1785898
user1785898

Reputation: 167

I managed the issue by finding all parameters being submitted in the page and submitting them too in my script, ignoring the webarchive.

Upvotes: 0

Related Questions