Reputation: 4401
Similar to Getting SEGV / core dumped when running Perl debugger, but still different:
At a specific development stage of my complex program (sorry can't provide a MRE as the program processes results retrieved from an LDAP server) perl dumps core when run with the debugger having set a breakpoint to a specific function that will call an external program (the whole intention of the breakpoint was to verify which program would be called exactly and what the parameters are).
Before going to the detail, here is the core dump with some details (perl 5.18.2, BTW):
Message: Process 2369 (perl) of user 1025 dumped core.
Stack trace of thread 2369:
#0 0x00007fc575680397 kill (libc.so.6)
#1 0x00000000004fe592 Perl_apply (perl)
#2 0x00000000004f2863 Perl_pp_chown (perl)
#3 0x00000000004a4166 Perl_runops_standard (perl)
#4 0x00000000004370c8 Perl_call_sv (perl)
#5 0x0000000000493c24 Perl_sighandler (perl)
#6 0x00007fc5757fece0 __restore_rt (libpthread.so.0)
#7 0x00000000004a47c0 Perl_pp_const (perl)
#8 0x00000000004a4166 Perl_runops_standard (perl)
#9 0x000000000043e041 perl_run (perl)
#10 0x000000000041e9d3 main (perl)
#11 0x00007fc57566bac5 __libc_start_main (libc.so.6)
#12 0x000000000041ea0b _start (perl)
Reading symbols from /usr/bin/perl...
(No debugging symbols found in /usr/bin/perl)
warning: Can't open file /run/nscd/dbslq7wC (deleted) during file-backed mapping note processing
[New LWP 2369]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `perl -d ../ldap-user-check --ldap-suffix=dc=REDACTED'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fc575680397 in kill () from /lib64/libc.so.6
Missing separate debuginfos, use: zypper install perl-base-debuginfo-5.18.2-12.26.1.x86_64
(gdb) frame 7
#7 0x00000000004a47c0 in Perl_pp_const ()
(gdb) disassemble
Dump of assembler code for function Perl_pp_const:
0x00000000004a4780 <+0>: push %rbx
0x00000000004a4781 <+1>: mov (%rdi),%rax
0x00000000004a4784 <+4>: mov %rdi,%rbx
0x00000000004a4787 <+7>: mov 0x20(%rdi),%rdx
0x00000000004a478b <+11>: sub %rax,%rdx
0x00000000004a478e <+14>: cmp $0x7,%rdx
0x00000000004a4792 <+18>: jle 0x4a47d4 <Perl_pp_const+84>
0x00000000004a4794 <+20>: mov 0x8(%rbx),%rcx
0x00000000004a4798 <+24>: lea 0x8(%rax),%rsi
0x00000000004a479c <+28>: mov 0x28(%rcx),%rdx
0x00000000004a47a0 <+32>: test %rdx,%rdx
0x00000000004a47a3 <+35>: je 0x4a47b8 <Perl_pp_const+56>
0x00000000004a47a5 <+37>: mov %rdx,0x8(%rax)
0x00000000004a47a9 <+41>: mov 0x8(%rbx),%rax
0x00000000004a47ad <+45>: mov %rsi,(%rbx)
0x00000000004a47b0 <+48>: pop %rbx
0x00000000004a47b1 <+49>: mov (%rax),%rax
0x00000000004a47b4 <+52>: ret
0x00000000004a47b5 <+53>: nopl (%rax)
0x00000000004a47b8 <+56>: mov 0x18(%rcx),%rcx
0x00000000004a47bc <+60>: mov 0x10(%rbx),%rdx
=> 0x00000000004a47c0 <+64>: mov (%rdx,%rcx,8),%rdx
0x00000000004a47c4 <+68>: mov %rdx,0x8(%rax)
0x00000000004a47c8 <+72>: mov 0x8(%rbx),%rax
0x00000000004a47cc <+76>: mov %rsi,(%rbx)
0x00000000004a47cf <+79>: pop %rbx
0x00000000004a47d0 <+80>: mov (%rax),%rax
0x00000000004a47d3 <+83>: ret
0x00000000004a47d4 <+84>: mov $0x1,%ecx
0x00000000004a47d9 <+89>: mov %rax,%rdx
0x00000000004a47dc <+92>: mov %rax,%rsi
0x00000000004a47df <+95>: call 0x4d6860 <Perl_stack_grow>
0x00000000004a47e4 <+100>: jmp 0x4a4794 <Perl_pp_const+20>
End of assembler dump.
The line that causes trouble looks like this:
check_exec(safe_exec(qw(/usr/bin/sh -c), join(' ', @cmd)), 0);
The breakpoint was set to safe_exec
which should pass the result of the join
as a single parameter to sh -c
. check_exec
is just a wrapper that checks the exit code and error output that safe_exec
will return.
In my understanding the debugger should stop before safe_exec
starts to examine the parameters; it starts like this:
sub safe_exec(@)
{
use IO::File;
my $result = [];
my $stderr_tmp = IO::File->new_tmpfile();
my $cmd = join(' ', map {
my $a = $_; $a =~ s/\\/\\\\/g; $a =~ s/"/\\"/g; $a;
} @_);
verbose(1, "safe_exec: executing $cmd");
#...
(verbose()
is my debugging routine that basically just outputs its parameters conditionally)
The really odd facts are:
safe_exec
, and I can even step into it without getting a core dump.Like this:
main::send_collected_messages(../ldap-user-check:3516):
3516: check_exec(safe_exec(qw(/usr/bin/sh -c), join(' ', @cmd)), 0);
DB<4> s
main::safe_exec(../ldap-user-check:696):
696: my $result = [];
DB<4> n
main::safe_exec(../ldap-user-check:697):
697: my $stderr_tmp = IO::File->new_tmpfile();
DB<4> n
main::safe_exec(../ldap-user-check:699):
699: my $a = $_; $a =~ s/\\/\\\\/g; $a =~ s/"/\\"/g; $a;
DB<4> n
main::safe_exec(../ldap-user-check:698):
698: my $cmd = join(' ', map {
DB<4> n
main::safe_exec(../ldap-user-check:699):
699: my $a = $_; $a =~ s/\\/\\\\/g; $a =~ s/"/\\"/g; $a;
#...
DB<4> n
main::safe_exec(../ldap-user-check:699):
699: my $a = $_; $a =~ s/\\/\\\\/g; $a =~ s/"/\\"/g; $a;
DB<4> n
main::safe_exec(../ldap-user-check:702):
702: verbose(1, "safe_exec: executing $cmd");
DB<4> n
[1] safe_exec: executing /usr/bin/sh -c /usr/bin/mailx -n -a '/tmp/msg-qhHO9Y.txt' -a '/tmp/msg-fI7Tuk.txt' (...) < '/tmp/msg-nQXQP9.txt'
("..." indicate leaving out more of similar stuff)
The check_exec(safe_exec(...))
has been used long before the way I described, and I never had a core dump.
Also those functions were not changed for weeks; only the functions using them were changed.
Anybody have an idea what might cause that? A bug in the garbage collector?
It's getting more strange:
After the stepping where there was no problem, I had deleted all breakpoints and set a breakpoint to safe_exec
only, and then it did not dump core!
But when I started the debugger freshly, and set a breakpoint to safe_exec
only, then it dumped core like this:
[5] send_collected_messages: shell command is /usr/bin/mailx -n -a '/tmp/msg-2_OlWE.txt' -a '/tmp/msg-XKWeqq.txt' (...REDACTED DETAILS...) < '/tmp/msg-8J9ywX.txt'
Signal SEGV at ../ldap-user-check line 3516.
main::send_collected_messages('HASH(0x30088d0)') called at ../ldap-user-check line 3773
main::check_accounts('HASH(0x2481f68)', 'HASH(0x2494a80)') called at ../ldap-user-check line 4104
main::main() called at ../ldap-user-check line 4128
./ppolicy-next.sh: line 38: 4335 Aborted (core dumped) perl -d
(./ppolicy-next.sh
is a shell script to call the debugger with the same parameters always)
Lines 3515f are:
verbose(5, "${me}:", 'shell command is', sub { join(' ', @cmd) });
check_exec(safe_exec(qw(/usr/bin/sh -c), join(' ', @cmd)), 0);
In the journal I also find these messages for each core dump:
AT_NULL terminator not found, cannot parse auxv structure.
I think I'm getting closer to the riddle:
One of the recent changes was using a tied DB_File
hash, and I realized that after the crash the data added to the tied has is missing from the file.
So I added a sync
on the database object.
Interestingly now the program does not crash any more!
So it seems the crash happens when there is unsynced "dirty" data in the tied hash.
Never heard of such before, especially as it's completely non-obvious that DB_File
might be involved.
Upvotes: 0
Views: 104