Reputation: 20117
I'm working on a task that involves analysing the latest version of every package in bioconda. This can't be done using one large environment because solving the dependencies of such a large environment would take days, and may not even guarantee the latest version of each package. For this reason I'm trying to separately install each package in its own conda environment.
To speed this up I'm trying to parallelise as much of this as possible. Now I'm aware that package installations can't be run concurrently in conda, because each process needs write permission to the package cache. However, it seems to me that creating and then solving the environments can be run concurrently, I just have to install the packages serially.
Now, I can run the solve in parallel by running conda install my_package --json --dry-run > plan.json
in each process, which outputs a nice JSON file describing the solve for each environment. If I have the output from this, how can I tell conda "install packages using this already solved execution plan"? I'm envisaging something like conda install --plan plan.json
, but such a flag doesn't exist.
How can I separate the solving and installation of conda environments? Or perhaps there's another way to install a large amount of conda environments in a different way?
Upvotes: 3
Views: 447
Reputation: 20117
Since version 4.6 at least, Conda has exposed a beta API to the Solver
class. It turns out this lets you do exactly what I need. It is still in beta, so with the disclaimer that this will probably break in future conda releases, you can currently do this in Conda 4.8.x
:
from conda.api import Solver
# Solve the environment, which can be done concurrently
solver = Solver(
dir, # The location of the conda environment
["bioconda", "conda-forge"], # A list of conda channels to use
specs_to_add=["bwa=0.7.17"], # A list of packages to install
)
transaction = solver.solve_for_transaction()
# This part must be done serially, so use a multiprocessing.Lock here
with lock:
transaction.download_and_extract()
transaction.execute()
Upvotes: 2