George
George

Reputation: 196

Segmentation fault (core dumped) in simple (but long) bash function

I am using Nextflow.io to schedule several thousand analysis jobs and then join the outputs.

Nextflow is a DSL that allows me to specify channels and processes and schedule and run those. Under the hood, it creates bash scripts for each process, which is why I'm posting here rather than https://github.com/nextflow-io/nextflow .

I can provide a full version of the script, but this is a cutdown version:

#!/bin/bash

nxf_stage() {
    true
    #THIS IS WHERE IT BREAKS
    ...
}    

nxf_main() {
    trap on_exit EXIT
    trap on_term TERM INT USR1 USR2
    # some more prep here
    nxf_stage
    ...
    wait $pid || nxf_main_ret=$?
    ...
    nxf_unstage
}

$NXF_ENTRY

The purpose of the nxf_stage function is to prepare the files that that process needs. In place of the comment above where I've said it breaks is approximately 76,000 lines like this:

rm -f result_job_073241-D_RGB_3D_3D_side_far_0_2019-03-12_03-25-01.json

followed by the same number of lines like this:

ln -s /home/ubuntu/plantcv-pipeline/work/8d/ffe3d29ee581c09d3d25706c238d1d/result_job_073241-D_RGB_3D_3D_side_far_0_2019-03-12_03-25-01.json result_job_073241-D_RGB_3D_3D_side_far_0_2019-03-12_03-25-01.json

When I try and execute the nextflow script, I get this error:

Segmentation fault (core dumped)

I was able to debug it to that function just with echo statements either side but nothing in that function seems complicated to me. Indeed when I stripped back everything else and just left the script as ~152,000 lines of rm and ln commands, it just worked.

Is it possible that a function of this size has a memory footprint causing the segfault? It seems that each command itself is small.

Update:

Output of bash -x:

+ set -x
+ set -e
+ set -u
+ NXF_DEBUG=0
+ [[ 0 > 1 ]]
+ NXF_ENTRY=nxf_main
+ nxf_main
+ trap on_exit EXIT
+ trap on_term TERM INT USR1 USR2
++ dd bs=18 count=1 if=/dev/urandom
++ base64
++ tr +/ 0A
+ export NXF_BOXID=nxf-1qYK72XftztQW4ocxx3Fs1tC
+ NXF_BOXID=nxf-1qYK72XftztQW4ocxx3Fs1tC
+ NXF_SCRATCH=
+ [[ 0 > 0 ]]
+ touch /home/ubuntu/plantcv-pipeline/work/ec/da7ca4e909b2cc4a74ed8963cc5feb/.command.begin
+ set +u
+ set -u
+ [[ -n '' ]]
+ nxf_stage
Segmentation fault (core dumped)

Upvotes: 2

Views: 4555

Answers (1)

George
George

Reputation: 196

I came upon the solution here: https://stackoverflow.com/a/14471782/5447556

Essentially, the function was being allocated on the stack and was hitting the soft stack space limit.

Running ulimit -sS unlimited allowed me to increase this limit.

adding beforeScript: 'ulimit -sS unlimited' to my nextflow process was a successful workaround. It's worth noting that this won't work in extreme cases, and is quite clunky. I think the real solution will be for nextflow to implement a more robust staging process for large input channels.

Upvotes: 3

Related Questions