Dr.Bob
Dr.Bob

Reputation: 1162

Possible to create ZIM file of whole Wiki? (my own, based on mediawiki)

I want to generate an offline ZIM version of our own Wiki (that runs on Mediawiki).The Collection extension is a breeze to install, but only works for selecting single pages, which in a next step can be combined into a single ZIM file.

But with a wiki of hundreds of pages this is too hard to do, based on single pages. I want to have a zim-dump of the whole wiki. I know it's possible, because there is also a zimfile for the complete wikipedia.

However, I can't find how this is done. Anyone able to help? Thanks in advance!

Upvotes: 9

Views: 6112

Answers (3)

pdeli
pdeli

Reputation: 534

I don't know up to what extent this answer is still relevant, but here it goes…

After much trouble, I finally managed to create a ZIM file out of my private MediaWiki-based wiki:

  • I started with this page: OpenZIM - Build your ZIM file
  • I tested all of the listed possibilities but only mwoffliner worked (for me)
  • The installation was done in a VirtualBox (version 6.0.0) Ubuntu 18.10 Desktop guest, hosted on a Mac (macOS Mojave, vs. 10.14.2)
    • Note that I ended up using the Guest OS as headless, so the graphical interface became useless, next step will be to use a server version of Ubuntu
  • After much struggle, I managed to make mwoffliner work but not without the precious help of the developers on GitHub

Please find here below step-by-step instructions on what I did. Note that the main instructions come from mwoffliner branch of openZIM on GitHub, therefore most of the credit of these instructions goes to them.

NodeJS

$ sudo apt install curl
$ curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh | bash && source ~/.bashrc && nvm install stable && node --version

Image Processing & Redis & git & meson & gcc & g++ & pkg-config installation

$ sudo apt install jpegoptim advancecomp gifsicle pngquant imagemagick redis-server git meson g++ pkg-config libzim-dev

libzim-dev: manual upgrade from version 2.0.0 to version >=4.0.0

1- If libzim 2.0.0 (libzim-dev) is already installed, then proceed with uninstalling it, else continue with point 2.

$ sudo apt remove libzim-dev #removes libzim 2.0.0
$ sudo apt purge libzim-dev
$ sudo apt autoremove #removes libzim2

2- Install libzim version >=4.0.0

$ sudo apt install cython3 liblzma-dev libgumbo-dev libicu-dev libmagic-dev libxapian-dev python-dev python-pip python-virtualenv zlib1g-dev
$ git clone https://github.com/openzim/libzim.git
$ cd libzim
$ meson . build
$ ninja -C build
$ sudo ninja -C build install
$ sudo ldconfig

ZimWriterFS Manual installation

(Source)

$ cd ~/Downloads/
$ sudo apt install librsvg2-bin
$ git clone https://github.com/openzim/zimwriterfs.git
$ cd zimwriterfs
$ meson . build
$ ninja -C build
$ sudo ninja -C build install
$ zimwriterfs
zimwriterfs usage page should appear

VirtualBox - Access VirtualBox Guest from host OS

  • (Source)

    1. Start VirtualBox 6.x.x
    2. Menu File
    3. Choose Host Network Manager…
    4. Choose tab DHCP Server
    5. Click Create (upper left corner of the window)
    6. Select Enable Server
    7. Server Address: 192.168.56.2
    8. Server Mask 255.255.255.0
    9. Lower Address Bound: 192.168.56.3
    10. Upper Address Bound: 192.168.56.254
    11. Choose tab Adapter
    12. Verify that "Configure Adapter Manually" is selected and,
    13. IPv4 Address: 192.168.56.1
    14. IPv4 Network Mask: 255.255.255.0
    15. Click Close
    16. Right-click on the guest machine
    17. Select Settings… (or just press cmd-s)
    18. Choose tab Network
    19. Select tab Adapter 2
    20. Click Enable Network Adapter
    21. Attached to: select Host-only Adapter
    22. Name: vboxnet0
    23. Click OK
    24. Start Guest machine

mwoffliner command issued

This command assumes that:

  • The MediaWiki wiki is up and running,
  • VirtualBox attributed the IP address 192.168.56.5 to the guest OS (see instructions under section VirtualBox - Access VirtualBox Guest from host OS above) (check the IP address of the OS with ifconfig)
  • LocalSettings.php's $wgServer = "http://192.168.56.5"; (check the IP address of the OS with ifconfig)
  • The name of your wiki is YourWiki
  • The MediaWiki folder containing your wiki is in /var/www/html/ (i.e., /var/www/html/YourWiki)

The actual command:

mwoffliner --mwUrl=http://192.168.56.5/YourWiki [email protected] --verbose --redis=redis://127.0.0.1:6379 --mwWikiPath=/ --mwApiPath=api.php --localParsoid

Upvotes: 8

Luis H Cabrejo
Luis H Cabrejo

Reputation: 316

There are a few tools you may want to test. Some have been removed, but most of them are in development.

Here are some notes on how to prepare your materials and use zimwriterfs. The notes are incomplete as they're based on my limited experience using the tool (Taken from http://www.openzim.org/wiki/Build_your_ZIM_file)

Here is the link to create a ZIM file from existing HTML contents "See http://www.openzim.org/wiki/Zimwriterfs_instructions for an overview and read the section below on zimwriterfs for some additional context"

I have tried another windows program called Zim - A Desktop Wiki http://zim-wiki.org/ is limited, but you can give it a try. It does the opposite, converts ZIMs to HTML.

Anyway, let us know how it went, Im also interested on building my own zim files. Good luck.

Upvotes: 1

Nemo
Nemo

Reputation: 2544

Yes you can, but it's not easy. Kiwix devs are now working on a Parsoid-based solution: http://sourceforge.net/p/kiwix/other/ci/master/tree/mwoffliner/ Parsoid is, in short, the backend of the MediaWiki VisualEditor, which handles the translation of wikitext to HTML and vice versa. It has a cache of HTML versions that can be exploited for such stuff. https://www.mediawiki.org/wiki/Parsoid should give some info on how to set it up...

Upvotes: 1

Related Questions