Empty core dump file after Segmentation fault

I am running a program, and it is interrupted by Segmentation fault. The problem is that the core dump file is created, but of size zero.

Have you heard about such a case and how to resolve it?

I have enough space on the disk. I have already performed ulimit -c unlimited to unlimit the size of core file - both running it or putting on the top of the submitted batch file - but still have 0 byte core dump files. The permissions of the folder containing these files are uog+rw and the permissions on the core files created are u+rw only.

The program is written by C++ and submitted on a linux cluster with qsub command of the Grid Engine, I don't know this information is relevant or not to this question.

Upvotes: 21

Answers (4)

gglinux

Reputation: 31

If you run the core file in a mounted drive.The core file can't be written to a mounted drive but must be written to the local drive.

You can copy the file to the local drive.

Upvotes: 3

yozniak

Reputation: 695

setting ulimit -c unlimited turned on generation of dumps. by default core dumps were generated in current directory which was on nfs. setting /proc/sys/kernel/core_pattern to /tmp/core helped me to solve the problem of empty dumps.

The comment from Ranjith Ruban helped me to develop this workaround.

What is the filesystem that you are using for dumping the core?

Upvotes: 17

Vince

Reputation: 3395

You can set resource limits such as physical memory required by using qsub option such as -l h_vmem=6G to reserver 6 GB of physical memory.

For file blocks you can set h_fsizeto appropriate value as well.

See RESOURCE LIMITS section of qconf manpage:

http://gridscheduler.sourceforge.net/htmlman/htmlman5/queue_conf.html

s_cpu     The per-process CPU time limit in seconds.

s_core    The per-process maximum core file size in bytes.

s_data    The per-process maximum memory limit in bytes.

s_vmem    The same as s_data (if both are set the minimum is
           used).
h_cpu     The per-job CPU time limit in seconds.

h_data    The per-job maximum memory limit in bytes.

h_vmem    The same as h_data (if both are set the minimum is
           used).

h_fsize   The total number of disk blocks that this job  can
           create.

Also, if cluster uses local TMPDIR to each node, and that is filling up, you can set TMPDIR to alternate location with more capacity, e.g. NFS share:

export TEMPDIR=<some NFS mounted directory>

Then launch qsub with the -V option to export the current environment to the job.

One or a combination of the above may help you solve your problem.

Upvotes: 0

Randy

Reputation: 219

It sounds like you're using a batch scheduler to launch your executable. Maybe the shell that Torque/PBS is using to spawn your job inherits a different ulimit value? Maybe the scheduler's default config is not to preserve core dumps?

Can you run your program directly from the command line instead?

Or if you add ulimit -c unlimited and/or ulimit -s unlimited to the top of your PBS batch script before invoking your executable, you might be able to override PBS' default ulimit behavior. Or adding 'ulimit -c' could report what the limit is anyway.

Upvotes: 7

Empty core dump file after Segmentation fault

Answers (4)

Related Questions