Reputation: 196
I am using Nextflow.io to schedule several thousand analysis jobs and then join the outputs.
Nextflow is a DSL that allows me to specify channels and processes and schedule and run those. Under the hood, it creates bash scripts for each process, which is why I'm posting here rather than https://github.com/nextflow-io/nextflow .
I can provide a full version of the script, but this is a cutdown version:
#!/bin/bash
nxf_stage() {
true
#THIS IS WHERE IT BREAKS
...
}
nxf_main() {
trap on_exit EXIT
trap on_term TERM INT USR1 USR2
# some more prep here
nxf_stage
...
wait $pid || nxf_main_ret=$?
...
nxf_unstage
}
$NXF_ENTRY
The purpose of the nxf_stage function is to prepare the files that that process needs. In place of the comment above where I've said it breaks is approximately 76,000 lines like this:
rm -f result_job_073241-D_RGB_3D_3D_side_far_0_2019-03-12_03-25-01.json
followed by the same number of lines like this:
ln -s /home/ubuntu/plantcv-pipeline/work/8d/ffe3d29ee581c09d3d25706c238d1d/result_job_073241-D_RGB_3D_3D_side_far_0_2019-03-12_03-25-01.json result_job_073241-D_RGB_3D_3D_side_far_0_2019-03-12_03-25-01.json
When I try and execute the nextflow script, I get this error:
Segmentation fault (core dumped)
I was able to debug it to that function just with echo statements either side but nothing in that function seems complicated to me. Indeed when I stripped back everything else and just left the script as ~152,000 lines of rm
and ln
commands, it just worked.
Is it possible that a function of this size has a memory footprint causing the segfault? It seems that each command itself is small.
Update:
Output of bash -x
:
+ set -x
+ set -e
+ set -u
+ NXF_DEBUG=0
+ [[ 0 > 1 ]]
+ NXF_ENTRY=nxf_main
+ nxf_main
+ trap on_exit EXIT
+ trap on_term TERM INT USR1 USR2
++ dd bs=18 count=1 if=/dev/urandom
++ base64
++ tr +/ 0A
+ export NXF_BOXID=nxf-1qYK72XftztQW4ocxx3Fs1tC
+ NXF_BOXID=nxf-1qYK72XftztQW4ocxx3Fs1tC
+ NXF_SCRATCH=
+ [[ 0 > 0 ]]
+ touch /home/ubuntu/plantcv-pipeline/work/ec/da7ca4e909b2cc4a74ed8963cc5feb/.command.begin
+ set +u
+ set -u
+ [[ -n '' ]]
+ nxf_stage
Segmentation fault (core dumped)
Upvotes: 2
Views: 4555
Reputation: 196
I came upon the solution here: https://stackoverflow.com/a/14471782/5447556
Essentially, the function was being allocated on the stack and was hitting the soft stack space limit.
Running ulimit -sS unlimited
allowed me to increase this limit.
adding beforeScript: 'ulimit -sS unlimited'
to my nextflow process was a successful workaround. It's worth noting that this won't work in extreme cases, and is quite clunky. I think the real solution will be for nextflow to implement a more robust staging process for large input channels.
Upvotes: 3