Singularity/Nexflow script won't load Tensorflow from .sif on cluster

Question

I'm trying to run a singularity/nextflow script on an HPC. This script utilizes tensorflow, which is specified in the docker image I used to pull the initial .sif file from lpryszcz/deeplexicon:latest. However, whenever I try to import tensorflow within my nextflow pipeline like this:

 python3 -c "import tensorflow as tf; print('TensorFlow version:', tf.__version__)" >> ${params.resultsDir}/cuda_paths.txt

I am presented with this error:

  ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
  
  
  Failed to load the native TensorFlow runtime.

I initially attempted to fix this by determining where my libcuda.so file was:

ldconfig -p | grep libcuda

Which turned out to be: libcudart.so.10.0 (libc6,x86-64) => /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart.so.10.0

And then simply adding the libucda.so.10.0 path directly to LD_LIBRARY_PATH:

LD_LIBRARY_PATH=/usr/local/cuda-10.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH

However the issue still persists.

For more context, the input files for my job are as follows:

The nextflow pipeline script looks like this:

#!/usr/bin/env nextflow 
process demultiplex {
    input:
    
    path fast5, from: params.fast5

    output:

    script:
    if(params.demultiplex)
    """
    mkdir -p ${params.resultsDir}

    python3 -c "import tensorflow as tf; print('TensorFlow version:', tf.__version__)" >> ${params.resultsDir}/cuda_paths.txt

    python3 /deeplexicon/deeplexicon.py dmux -p ${fast5} -f multi -m models/resnet20-final.h5 > ${params.resultsDir}/output.tsv
    """
    else
    """
        echo "Skipped"
    """
}

Which always throws the error when trying to import tensorflow.

And the config file like this (with some changes made for anonymity):

 params{
    // Path to the sample description file
    fast5 = "/some_path/Deeplexicon/RNA004/fast5_pass"
    resultsDir = "/some_path/Deeplexicon/7_16_2"
    demultiplex = true
}

singularity {
    enabled = true
    autoMounts = false
    cacheDir = '/some_path/work/singularity_cache'

}

tower {
    enabled = false
    endpoint = '-'
    accessToken = 'nextflowTowerToken'
}

process{
    cpus = 1
    executor = 'slurm'
    queue = 'pascal_gpu'
    perJobMemLimit = true

    containerOptions='--bind (All of the CUDA paths from module show CUDA/10.1.243)'

    withName:demultiplex {
    container = 'deeplexicon_latest.sif'
    clusterOptions = '--gres=gpu:1'
    memory = { params.demultiplex ? 8.GB + (2.GB * (task.attempt-1)) : 2.GB }
    errorStrategy = { task.exitStatus == 130 ? 'retry' : 'terminate' }
    maxRetries = 3
}

}

Finally, my slurm job submission script looks like this:

#!/bin/bash
#SBATCH --job-name=deeplexicon_RNA002
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --mem=2G
#SBATCH --time=1:00:00
#SBATCH --partition=pascal_gpu
#SBATCH --gres=gpu:1


module purge
module load legacy-software
module load CUDA/10.0.130
module load Java/11.0.20


export APPTAINER_TMPDIR=$SCRATCH_VO_USER/apptainer-tmp
export APPTAINER_CACHEDIR=$SCRATCH_USER/apptainer-cache

export NXF_SINGULARITY_CACHEDIR=/some_path/work/singularity_cache

export LD_LIBRARY_PATH=/usr/local/cuda-10.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH


/some_path/tools/nextflow/nextflow-22.10.0-all -c /some_path/Deeplexicon/nextflow_scripts/deeplexicon.conf run /some_path/Deeplexicon/nextflow_scripts/deeplexicon.nf

Really not sure why this is not recognizing tensorflow upon import.

Singularity/Nexflow script won't load Tensorflow from .sif on cluster

Answers (1)

Related Questions

Singularity/Nexflow script won&#39;t load Tensorflow from .sif on cluster

Answers (1)

Related Questions

Singularity/Nexflow script won't load Tensorflow from .sif on cluster