U. Windl
U. Windl

Reputation: 4401

Perl dumping core (SEGV) when setting a breakpoint to a specific function

Similar to Getting SEGV / core dumped when running Perl debugger, but still different:

At a specific development stage of my complex program (sorry can't provide a MRE as the program processes results retrieved from an LDAP server) perl dumps core when run with the debugger having set a breakpoint to a specific function that will call an external program (the whole intention of the breakpoint was to verify which program would be called exactly and what the parameters are).

Before going to the detail, here is the core dump with some details (perl 5.18.2, BTW):

       Message: Process 2369 (perl) of user 1025 dumped core.

                Stack trace of thread 2369:
                #0  0x00007fc575680397 kill (libc.so.6)
                #1  0x00000000004fe592 Perl_apply (perl)
                #2  0x00000000004f2863 Perl_pp_chown (perl)
                #3  0x00000000004a4166 Perl_runops_standard (perl)
                #4  0x00000000004370c8 Perl_call_sv (perl)
                #5  0x0000000000493c24 Perl_sighandler (perl)
                #6  0x00007fc5757fece0 __restore_rt (libpthread.so.0)
                #7  0x00000000004a47c0 Perl_pp_const (perl)
                #8  0x00000000004a4166 Perl_runops_standard (perl)
                #9  0x000000000043e041 perl_run (perl)
                #10 0x000000000041e9d3 main (perl)
                #11 0x00007fc57566bac5 __libc_start_main (libc.so.6)
                #12 0x000000000041ea0b _start (perl)

Reading symbols from /usr/bin/perl...
(No debugging symbols found in /usr/bin/perl)

warning: Can't open file /run/nscd/dbslq7wC (deleted) during file-backed mapping note processing
[New LWP 2369]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `perl -d ../ldap-user-check --ldap-suffix=dc=REDACTED'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fc575680397 in kill () from /lib64/libc.so.6
Missing separate debuginfos, use: zypper install perl-base-debuginfo-5.18.2-12.26.1.x86_64
(gdb) frame 7
#7  0x00000000004a47c0 in Perl_pp_const ()
(gdb) disassemble
Dump of assembler code for function Perl_pp_const:
   0x00000000004a4780 <+0>:     push   %rbx
   0x00000000004a4781 <+1>:     mov    (%rdi),%rax
   0x00000000004a4784 <+4>:     mov    %rdi,%rbx
   0x00000000004a4787 <+7>:     mov    0x20(%rdi),%rdx
   0x00000000004a478b <+11>:    sub    %rax,%rdx
   0x00000000004a478e <+14>:    cmp    $0x7,%rdx
   0x00000000004a4792 <+18>:    jle    0x4a47d4 <Perl_pp_const+84>
   0x00000000004a4794 <+20>:    mov    0x8(%rbx),%rcx
   0x00000000004a4798 <+24>:    lea    0x8(%rax),%rsi
   0x00000000004a479c <+28>:    mov    0x28(%rcx),%rdx
   0x00000000004a47a0 <+32>:    test   %rdx,%rdx
   0x00000000004a47a3 <+35>:    je     0x4a47b8 <Perl_pp_const+56>
   0x00000000004a47a5 <+37>:    mov    %rdx,0x8(%rax)
   0x00000000004a47a9 <+41>:    mov    0x8(%rbx),%rax
   0x00000000004a47ad <+45>:    mov    %rsi,(%rbx)
   0x00000000004a47b0 <+48>:    pop    %rbx
   0x00000000004a47b1 <+49>:    mov    (%rax),%rax
   0x00000000004a47b4 <+52>:    ret
   0x00000000004a47b5 <+53>:    nopl   (%rax)
   0x00000000004a47b8 <+56>:    mov    0x18(%rcx),%rcx
   0x00000000004a47bc <+60>:    mov    0x10(%rbx),%rdx
=> 0x00000000004a47c0 <+64>:    mov    (%rdx,%rcx,8),%rdx
   0x00000000004a47c4 <+68>:    mov    %rdx,0x8(%rax)
   0x00000000004a47c8 <+72>:    mov    0x8(%rbx),%rax
   0x00000000004a47cc <+76>:    mov    %rsi,(%rbx)
   0x00000000004a47cf <+79>:    pop    %rbx
   0x00000000004a47d0 <+80>:    mov    (%rax),%rax
   0x00000000004a47d3 <+83>:    ret
   0x00000000004a47d4 <+84>:    mov    $0x1,%ecx
   0x00000000004a47d9 <+89>:    mov    %rax,%rdx
   0x00000000004a47dc <+92>:    mov    %rax,%rsi
   0x00000000004a47df <+95>:    call   0x4d6860 <Perl_stack_grow>
   0x00000000004a47e4 <+100>:   jmp    0x4a4794 <Perl_pp_const+20>
End of assembler dump.

The line that causes trouble looks like this:

check_exec(safe_exec(qw(/usr/bin/sh -c), join(' ', @cmd)), 0);

The breakpoint was set to safe_exec which should pass the result of the join as a single parameter to sh -c. check_exec is just a wrapper that checks the exit code and error output that safe_execwill return.

In my understanding the debugger should stop before safe_exec starts to examine the parameters; it starts like this:

sub safe_exec(@)
{
    use IO::File;
    my $result = [];
    my $stderr_tmp = IO::File->new_tmpfile();
    my $cmd = join(' ', map {
    my $a = $_; $a =~ s/\\/\\\\/g; $a =~ s/"/\\"/g; $a;
           } @_);

    verbose(1, "safe_exec: executing $cmd");
#...

(verbose() is my debugging routine that basically just outputs its parameters conditionally)

The really odd facts are:

Like this:

main::send_collected_messages(../ldap-user-check:3516):
3516:               check_exec(safe_exec(qw(/usr/bin/sh -c), join(' ', @cmd)), 0);
  DB<4> s
main::safe_exec(../ldap-user-check:696):
696:        my $result = [];
  DB<4> n
main::safe_exec(../ldap-user-check:697):
697:        my $stderr_tmp = IO::File->new_tmpfile();
  DB<4> n
main::safe_exec(../ldap-user-check:699):
699:            my $a = $_; $a =~ s/\\/\\\\/g; $a =~ s/"/\\"/g; $a;
  DB<4> n
main::safe_exec(../ldap-user-check:698):
698:        my $cmd = join(' ', map {
  DB<4> n
main::safe_exec(../ldap-user-check:699):
699:            my $a = $_; $a =~ s/\\/\\\\/g; $a =~ s/"/\\"/g; $a;
#...
  DB<4> n
main::safe_exec(../ldap-user-check:699):
699:            my $a = $_; $a =~ s/\\/\\\\/g; $a =~ s/"/\\"/g; $a;
  DB<4> n
main::safe_exec(../ldap-user-check:702):
702:        verbose(1, "safe_exec: executing $cmd");
  DB<4> n
[1] safe_exec: executing /usr/bin/sh -c /usr/bin/mailx -n -a '/tmp/msg-qhHO9Y.txt' -a '/tmp/msg-fI7Tuk.txt' (...) < '/tmp/msg-nQXQP9.txt'

("..." indicate leaving out more of similar stuff)

The check_exec(safe_exec(...)) has been used long before the way I described, and I never had a core dump. Also those functions were not changed for weeks; only the functions using them were changed.

Anybody have an idea what might cause that? A bug in the garbage collector?

It's getting more strange: After the stepping where there was no problem, I had deleted all breakpoints and set a breakpoint to safe_exec only, and then it did not dump core! But when I started the debugger freshly, and set a breakpoint to safe_exec only, then it dumped core like this:

[5] send_collected_messages: shell command is /usr/bin/mailx -n -a '/tmp/msg-2_OlWE.txt' -a '/tmp/msg-XKWeqq.txt' (...REDACTED DETAILS...) < '/tmp/msg-8J9ywX.txt'
Signal SEGV at ../ldap-user-check line 3516.
        main::send_collected_messages('HASH(0x30088d0)') called at ../ldap-user-check line 3773
        main::check_accounts('HASH(0x2481f68)', 'HASH(0x2494a80)') called at ../ldap-user-check line 4104
        main::main() called at ../ldap-user-check line 4128
./ppolicy-next.sh: line 38:  4335 Aborted                 (core dumped) perl -d 

(./ppolicy-next.sh is a shell script to call the debugger with the same parameters always)

Lines 3515f are:

        verbose(5, "${me}:", 'shell command is', sub { join(' ', @cmd) });
        check_exec(safe_exec(qw(/usr/bin/sh -c), join(' ', @cmd)), 0);

In the journal I also find these messages for each core dump:

AT_NULL terminator not found, cannot parse auxv structure.

Interesting Observation made

I think I'm getting closer to the riddle: One of the recent changes was using a tied DB_File hash, and I realized that after the crash the data added to the tied has is missing from the file.

So I added a sync on the database object. Interestingly now the program does not crash any more!

So it seems the crash happens when there is unsynced "dirty" data in the tied hash. Never heard of such before, especially as it's completely non-obvious that DB_File might be involved.

Upvotes: 0

Views: 104

Answers (0)

Related Questions