Reputation: 1571
I would like to deploy a python environment on production servers that have no access to the internet.
I discovered Python Anaconda distribution and installed it to give it a try.
The installation directory is 1.6GB, and I can see in pkgs
directory that a lot of libraries are there.
However, when I try to install an environment, conda
does not lookup in the local directories...
conda create --offline --use-local --dry-run --name pandas_etl python
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata:
Solving package specifications:
Error: Package missing in current linux-64 channels:
- python
So, what is the point to bundle all those libraries if conda
needs to pick them up on online repositories? Maybe am I missing something?
I am looking for a kind of "lots of batteries included python" for convenient deployment.
Note: I use a Linux system and installed the regular anaconda, not the miniconda
Upvotes: 15
Views: 30204
Reputation: 1332
Another option is to use conda-pack.
from the documentation:
On the source machine
Pack environment my_env into my_env.tar.gz
$ conda pack -n my_envPack environment my_env into out_name.tar.gz
$ conda pack -n my_env -o out_name.tar.gzPack environment located at an explicit path into my_env.tar.gz
$ conda pack -p /explicit/path/to/my_envOn the target machine
Unpack environment into directory
my_env
$ mkdir -p my_env
$ tar -xzf my_env.tar.gz -C my_envUse python without activating or fixing the prefixes.
Most python libraries will work fine, but things that require prefix cleanups will fail.
$ ./my_env/bin/pythonActivate the environment. This adds
my_env/bin
to your path
$ source my_env/bin/activateRun python from in the environment
(my_env)$ pythonCleanup prefixes from in the active environment.
- Note that this command can also be run without activating the environment
as long as some version of python is already installed on the machine.
(my_env)$ conda-unpackAt this point the environment is exactly as if you installed it here
using conda directly. All scripts should work fine.
(my_env)$ ipython --versionDeactivate the environment to remove it from your path
(my_env)$ source my_env/bin/deactivate
Upvotes: 7
Reputation: 8833
I have a similar situation, and came up with a different solution - maybe less 'Pythonic' ('Condaic?'), but very convenient. It has some assumptions, but it may be a common situation, and may be useful even in your case ;)
Situation / assumptions:
Both the production server and my machine use Linux, anaconda3, and they are the same architecture (in my case: x86_64).
The production server has no Internet
The machine used for deployment has Internet, and SSH to the production (tunnels, VPNs, whatever)
The trick - which works with my conda 4.3 - is to use sshfs
to mount the target environment as one of your own:
# prepare and enter the env 'remotely'
me@development:~/$ mkdir anaconda3/envs/production
me@development:~/$ sshfs [email protected]:anaconda3/envs/production anaconda3/envs/production
me@development:~/$ source ~/anaconda3/bin/activate production
# do the work
(production) me@development:~/$ conda install pandas
# do the cleanup
(production) me@development:~/$ source deactivate
me@development:~/$ fusermount -u anaconda3/envs/production
The problem comes when you want to play around with the root environment. This is after all anaconda3
directory, and it needs to be treated specially (e.g. the envs only symlink the conda
, activate
and deactivate
executables in bin/
subdirectory). Then you can go "all in" and mount the whole anaconda3
directory, but there's a caveat - the path on your machine must match production!
# prepare and enter anaconda root 'remotely'
me@development:~/$ sudo ln -s /home/me /home/prod_user
me@development:~/$ mv anaconda3 my_anaconda
me@development:~/$ mkdir anaconda3
me@development:~/$ sshfs [email protected]:anaconda3 anaconda3
# activate the root
me@development:~/$ source ~/anaconda3/bin/activate
# do the work
(root) me@development:~/$ conda install pandas
# do the cleanup
(root) me@development:~/$ source deactivate
me@development:~/$ fusermount -u anaconda3
me@development:~/$ rmdir anaconda3
me@development:~/$ mv my_anaconda anaconda3
This is what works for me, but I suggest you make a backup of your production environment before experimenting like this.
Upvotes: 1
Reputation: 1571
Well, after playing around with Pandas while reading Fabio Nelli book 'Python Data Analytic', I realize how much Pandas is an awesome library. SO, i've been working with Anaconda to make it work in my environment.
1- Download the Anaconda installer and install it (I guess miniconda will be enough)
2- Make a local channel by mirroring the (part of) anaconda repository
Do not try to download individual packages on your workstation to push them to your offline server. Indeed, dependencies will not be satisfied. Packages need to be contained in a channel and indexed in metadata files (repodata.json and repodata.json.bz2) to be properly 'glued' together.
I used wget to mirror a part of the anaconda repository : https://repo.continuum.io/pkgs/ I used something like this to filter out packages in order not to download the whole repo :
wget -r --no-parent -R --regex-type pcre --reject-regex '(.*py2[67].*)|(.*py[34].*)' https://repo.continuum.io/pkgs/free/linux-64/
Beware, not to use something like "only py35" packages. Indeed, many packages in the repo don't have version string in their name; and you'll miss them as dependency.
Well, i guess you can filter more accurately. I fetched about 6GB of packages!
!!!! Do NOT build a custom channel from the part of the repository you just downloaded !!!! (anaconda custom channels) I tried this at first and i had this exception : "RecursionError: maximum recursion depth exceeded while calling a Python object". This is a known pb : https://github.com/conda/conda/issues/2371 ==> the maintainers discuss this : the metadatas maintained in repodata.json and repodata.json.bz2 do not reflect metadatas in individual pkg. They choose to only edit the repo metadata to fix issues instead of each package metadatas. So, if you rebuild the channel metadatas from packages, you miss.
==> So : do not rebuild channel metadata, just keep the repository metadata (repodata.json and repodata.json.bz2 contained in the official anaconda repository). Even if the whole repo is not in your new channel, it'll work (at least, if you did not filter to much while mirroring ;-) )
3- test and use your new channel
conda search -c file://Path_to_your_channel/repo.continuum.io/pkgs/free/ --override-channels
NOTE : Do not include your platform architecture in the path. Exemple : your channel tree is probably : /Path_to_your_channel/repo.continuum.io/pkgs/free/linux-64 Just omit your arch (linux-64 in my case). Conda will find out.
Update :
conda update -c file://resto/anaconda_repo/repo.continuum.io/pkgs/free/ --override-channels --all
And so on... I guess, you can use the conda conf file of your system user to force using this local channel.
Hope it helps.
Guillaume
Upvotes: 10