stockersky
stockersky

Reputation: 1571

simply use python anaconda without internet connection

I would like to deploy a python environment on production servers that have no access to the internet.

I discovered Python Anaconda distribution and installed it to give it a try. The installation directory is 1.6GB, and I can see in pkgs directory that a lot of libraries are there.

However, when I try to install an environment, conda does not lookup in the local directories...

conda create --offline --use-local --dry-run  --name pandas_etl python
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata:
Solving package specifications:
Error:  Package missing in current linux-64 channels:
  - python

So, what is the point to bundle all those libraries if conda needs to pick them up on online repositories? Maybe am I missing something?

I am looking for a kind of "lots of batteries included python" for convenient deployment.

Note: I use a Linux system and installed the regular anaconda, not the miniconda

Upvotes: 15

Views: 30204

Answers (3)

skibee
skibee

Reputation: 1332

Another option is to use conda-pack.
from the documentation:

On the source machine

  • Pack environment my_env into my_env.tar.gz
    $ conda pack -n my_env

  • Pack environment my_env into out_name.tar.gz
    $ conda pack -n my_env -o out_name.tar.gz

  • Pack environment located at an explicit path into my_env.tar.gz
    $ conda pack -p /explicit/path/to/my_env

On the target machine

  • Unpack environment into directory my_env
    $ mkdir -p my_env
    $ tar -xzf my_env.tar.gz -C my_env

  • Use python without activating or fixing the prefixes.
    Most python libraries will work fine, but things that require prefix cleanups will fail.
    $ ./my_env/bin/python

  • Activate the environment. This adds my_env/bin to your path
    $ source my_env/bin/activate

  • Run python from in the environment
    (my_env)$ python

  • Cleanup prefixes from in the active environment.

  • Note that this command can also be run without activating the environment
  • as long as some version of python is already installed on the machine.
    (my_env)$ conda-unpack

  • At this point the environment is exactly as if you installed it here

  • using conda directly. All scripts should work fine.
    (my_env)$ ipython --version

  • Deactivate the environment to remove it from your path
    (my_env)$ source my_env/bin/deactivate

Upvotes: 7

Tomasz Gandor
Tomasz Gandor

Reputation: 8833

I have a similar situation, and came up with a different solution - maybe less 'Pythonic' ('Condaic?'), but very convenient. It has some assumptions, but it may be a common situation, and may be useful even in your case ;)

Situation / assumptions:

  1. Both the production server and my machine use Linux, anaconda3, and they are the same architecture (in my case: x86_64).

  2. The production server has no Internet

  3. The machine used for deployment has Internet, and SSH to the production (tunnels, VPNs, whatever)

The trick - which works with my conda 4.3 - is to use sshfs to mount the target environment as one of your own:

# prepare and enter the env 'remotely'
me@development:~/$ mkdir anaconda3/envs/production
me@development:~/$ sshfs [email protected]:anaconda3/envs/production anaconda3/envs/production
me@development:~/$ source ~/anaconda3/bin/activate production

# do the work
(production) me@development:~/$ conda install pandas 

# do the cleanup
(production) me@development:~/$ source deactivate
me@development:~/$ fusermount -u anaconda3/envs/production

The problem comes when you want to play around with the root environment. This is after all anaconda3 directory, and it needs to be treated specially (e.g. the envs only symlink the conda, activate and deactivate executables in bin/ subdirectory). Then you can go "all in" and mount the whole anaconda3 directory, but there's a caveat - the path on your machine must match production!

# prepare and enter anaconda root 'remotely'
me@development:~/$ sudo ln -s /home/me /home/prod_user
me@development:~/$ mv anaconda3 my_anaconda
me@development:~/$ mkdir anaconda3
me@development:~/$ sshfs [email protected]:anaconda3 anaconda3

# activate the root
me@development:~/$ source ~/anaconda3/bin/activate 

# do the work
(root) me@development:~/$ conda install pandas 

# do the cleanup
(root) me@development:~/$ source deactivate
me@development:~/$ fusermount -u anaconda3
me@development:~/$ rmdir anaconda3
me@development:~/$ mv my_anaconda anaconda3

This is what works for me, but I suggest you make a backup of your production environment before experimenting like this.

Upvotes: 1

stockersky
stockersky

Reputation: 1571

Well, after playing around with Pandas while reading Fabio Nelli book 'Python Data Analytic', I realize how much Pandas is an awesome library. SO, i've been working with Anaconda to make it work in my environment.

1- Download the Anaconda installer and install it (I guess miniconda will be enough)

2- Make a local channel by mirroring the (part of) anaconda repository

Do not try to download individual packages on your workstation to push them to your offline server. Indeed, dependencies will not be satisfied. Packages need to be contained in a channel and indexed in metadata files (repodata.json and repodata.json.bz2) to be properly 'glued' together.

I used wget to mirror a part of the anaconda repository : https://repo.continuum.io/pkgs/ I used something like this to filter out packages in order not to download the whole repo :

wget -r --no-parent -R --regex-type pcre --reject-regex '(.*py2[67].*)|(.*py[34].*)' https://repo.continuum.io/pkgs/free/linux-64/

Beware, not to use something like "only py35" packages. Indeed, many packages in the repo don't have version string in their name; and you'll miss them as dependency.

Well, i guess you can filter more accurately. I fetched about 6GB of packages!

!!!! Do NOT build a custom channel from the part of the repository you just downloaded !!!! (anaconda custom channels) I tried this at first and i had this exception : "RecursionError: maximum recursion depth exceeded while calling a Python object". This is a known pb : https://github.com/conda/conda/issues/2371 ==> the maintainers discuss this : the metadatas maintained in repodata.json and repodata.json.bz2 do not reflect metadatas in individual pkg. They choose to only edit the repo metadata to fix issues instead of each package metadatas. So, if you rebuild the channel metadatas from packages, you miss.

==> So : do not rebuild channel metadata, just keep the repository metadata (repodata.json and repodata.json.bz2 contained in the official anaconda repository). Even if the whole repo is not in your new channel, it'll work (at least, if you did not filter to much while mirroring ;-) )

3- test and use your new channel

conda search -c file://Path_to_your_channel/repo.continuum.io/pkgs/free/ --override-channels

NOTE : Do not include your platform architecture in the path. Exemple : your channel tree is probably : /Path_to_your_channel/repo.continuum.io/pkgs/free/linux-64 Just omit your arch (linux-64 in my case). Conda will find out.

Update :

conda update  -c file://resto/anaconda_repo/repo.continuum.io/pkgs/free/ --override-channels --all

And so on... I guess, you can use the conda conf file of your system user to force using this local channel.

Hope it helps.

Guillaume

Upvotes: 10

Related Questions