Reputation: 9830
I am running a program, and it is interrupted by Segmentation fault. The problem is that the core dump file is created, but of size zero.
Have you heard about such a case and how to resolve it?
I have enough space on the disk. I have already performed ulimit -c unlimited
to unlimit the size of core file - both running it or putting on the top of the submitted batch file - but still have 0 byte core dump files. The permissions of the folder containing these files are uog+rw and the permissions on the core files created are u+rw only.
The program is written by C++ and submitted on a linux cluster with qsub command of the Grid Engine, I don't know this information is relevant or not to this question.
Upvotes: 21
Views: 16089
Reputation: 31
If you run the core file in a mounted drive.The core file can't be written to a mounted drive but must be written to the local drive.
You can copy the file to the local drive.
Upvotes: 3
Reputation: 695
setting ulimit -c unlimited
turned on generation of dumps.
by default core dumps were generated in current directory which was on nfs.
setting /proc/sys/kernel/core_pattern
to /tmp/core
helped me to solve the problem of empty dumps.
The comment from Ranjith Ruban helped me to develop this workaround.
What is the filesystem that you are using for dumping the core?
Upvotes: 17
Reputation: 3395
You can set resource limits such as physical memory required by using qsub
option such as -l h_vmem=6G
to reserver 6 GB of physical memory.
For file blocks you can set h_fsize
to appropriate value as well.
See RESOURCE LIMITS section of qconf manpage:
http://gridscheduler.sourceforge.net/htmlman/htmlman5/queue_conf.html
s_cpu The per-process CPU time limit in seconds.
s_core The per-process maximum core file size in bytes.
s_data The per-process maximum memory limit in bytes.
s_vmem The same as s_data (if both are set the minimum is
used).
h_cpu The per-job CPU time limit in seconds.
h_data The per-job maximum memory limit in bytes.
h_vmem The same as h_data (if both are set the minimum is
used).
h_fsize The total number of disk blocks that this job can
create.
Also, if cluster uses local TMPDIR to each node, and that is filling up, you can set TMPDIR to alternate location with more capacity, e.g. NFS share:
export TEMPDIR=<some NFS mounted directory>
Then launch qsub
with the -V
option to export the current environment to the job.
One or a combination of the above may help you solve your problem.
Upvotes: 0
Reputation: 219
It sounds like you're using a batch scheduler to launch your executable. Maybe the shell that Torque/PBS is using to spawn your job inherits a different ulimit value? Maybe the scheduler's default config is not to preserve core dumps?
Can you run your program directly from the command line instead?
Or if you add ulimit -c unlimited
and/or ulimit -s unlimited
to the top of your PBS batch script before invoking your executable, you might be able to override PBS' default ulimit behavior. Or adding 'ulimit -c' could report what the limit is anyway.
Upvotes: 7