shiki
shiki

Reputation: 1

why I get segmentation fault when using qemu-riscv64 to run openblas test?

I'm doing cross compile(host: Ubuntu 20.04.6 LTS, x86_64; target: riscv64gcv) like the guthub action for testing target riscv64_zvl128b.

My code is on commit 8483a71 which is same with the CI test; my riscvtoolchain is riscv64-glibc-ubuntu-20.04-llvm-nightly-2024.02.02-nightly. my build scrips for openblaslib:

#!/bin/bash

export SRC_PATH=`pwd`
export RISCV_TOOLCHAIN=${SRC_PATH}/opt/riscv
export TOOLCHAIN_VERSION=13.2.0
export PATH=${RISCV_TOOLCHAIN}/bin:$PATH
make TARGET=RISCV64_ZVL128B CFLAGS="-DTARGET=RISCV64_ZVL128B" \
CC='clang --rtlib=compiler-rt -target riscv64-unknown-linux-gnu --sysroot ${RISCV_TOOLCHAIN}/sysroot --gcc-toolchain=${RISCV_TOOLCHAIN}/lib/gcc/riscv64-unknown-linux-gnu/${TOOLCHAIN_VERSION}' \
AR='riscv64-unknown-linux-gnu-ar' AS='riscv64-unknown-linux-gnu-gcc' LD='riscv64-unknown-linux-gnu-gcc' \
RANLIB='riscv64-unknown-linux-gnu-ranlib' \
FC='riscv64-unknown-linux-gnu-gfortran'  TARGET=RISCV64_ZVL128B BINARY=64 ARCH=riscv64 \
HOSTCC=gcc HOSTFC=gfortran -j

it seems no errors:

OpenBLAS build complete. (BLAS CBLAS LAPACK LAPACKE)

  OS               ... Linux             
  Architecture     ... riscv64               
  BINARY           ... 64bit                 
  C compiler       ... CLANG  (cmd & version : clang version 17.0.2 (https://github.com/llvm/llvm-project.git b2417f51dbbd7435eb3aaf203de24de6754da50e))
  Fortran compiler ... GFORTRAN  (cmd & version : GNU Fortran () 13.2.0)
  Library Name     ... libopenblas_riscv64_zvl128bp-r0.3.28.dev.a (Multi-threading; Max num-threads is 32)

To install the library, you can run "make PREFIX=/path/to/your/installation install".

Note that any flags passed to make during build should also be passed to make install
to circumvent any install errors.

my build scripts for OpenBLAS tests:

#!/bin/bash

export src_path=`pwd`
export RISCV_TOOLCHAIN=${src_path}/opt/riscv
export riscv_gnu_toolchain_version=13.2.0
export PATH=${RISCV_TOOLCHAIN}/bin:$PATH
# build OpenBLAS tests
make TARGET=RISCV64_ZVL128B CFLAGS="-DTARGET=RISCV64_ZVL128B" \
CC='riscv64-unknown-linux-gnu-gcc' \
AR='riscv64-unknown-linux-gnu-ar' AS='riscv64-unknown-linux-gnu-gcc' LD='riscv64-unknown-linux-gnu-gcc' \
RANLIB='riscv64-unknown-linux-gnu-ranlib' \
FC='riscv64-unknown-linux-gnu-gfortran' TARGET=RISCV64_ZVL128B BINARY=64 ARCH=riscv64 \
HOSTCC=gcc HOSTFC=gfortran -j tests 

There are no errors too;

Then i run the scrips for running OpenBLAS tests:

#!/bin/bash

export SRC_PATH=`pwd`
export RISCV_TOOLCHAIN=${SRC_PATH}/opt/riscv
export TOOLCHAIN_VERSION=13.2.0
export PATH=${RISCV_TOOLCHAIN}/bin:$PATH

# run OpenBLAS tests
export QEMU_CPU=rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=128,elen=64
rm -rf ./test_out
mkdir -p ./test_out
run_test() { local DIR=$1; local CMD=$2; local DATA=$3; local OUTPUT="./test_out/$DIR.$CMD"; \
echo "`pwd`/$DIR/$CMD $DIR/$DATA" >> $OUTPUT; \
if [[ -z $DATA ]]; then qemu-riscv64 ./$DIR/$CMD |& tee $OUTPUT ; \
else qemu-riscv64 ./$DIR/$CMD < ./$DIR/$DATA |& tee $OUTPUT ; fi ; \
RV=$? ; if [[ $RV != 0 ]]; then echo "*** FAIL: nonzero exit code $RV" >> $OUTPUT ; fi \
}
run_test test cblat1 &
run_test test cblat2 cblat2.dat &
run_test test cblat3 cblat3.dat &
run_test test dblat1 &
run_test test dblat2 dblat2.dat &
run_test test dblat3 dblat3.dat &
run_test test sblat1 &
run_test test sblat2 sblat2.dat &
run_test test sblat3 sblat3.dat &
run_test test zblat1 &
run_test test zblat2 zblat2.dat &
run_test test zblat3 zblat3.dat &
run_test ctest xccblat1 &
run_test ctest xccblat2 cin2 &
run_test ctest xccblat3 cin3 &
run_test ctest xdcblat1 &
run_test ctest xdcblat2 din2 &
run_test ctest xdcblat3 din3 &
run_test ctest xscblat1 &
run_test ctest xscblat2 sin2 &
run_test ctest xscblat3 sin3 &
run_test ctest xzcblat1 &
run_test ctest xzcblat2 zin2 &
run_test ctest xzcblat3 zin3 &
wait
grep -lZ FAIL ./test_out/* > temp_log_file
while IFS= read -r -d $'\0' LOG; do cat $LOG ; FAILURES=1 ; done < <(grep -lZ FAIL ./test_out/*)
  if [[ ! -z $FAILURES ]]; then echo "==========" ; echo "== FAIL ==" ; echo "==========" ; echo ; exit 1 ; fi

There some errors:

./ctest/xdcblat1: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./test/cblat1: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./test/sblat1: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./ctest/xdcblat2: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./ctest/xccblat1: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./test/sblat2: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./test/sblat3: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./ctest/xscblat3: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./test/zblat2: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./ctest/xscblat1: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./test/zblat1: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./ctest/xzcblat2: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./ctest/xccblat2: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./test/cblat3: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./test/dblat1: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./ctest/xccblat3: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./ctest/xzcblat1: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./test/dblat2: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./ctest/xscblat2: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./test/dblat3: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./test/cblat2: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./test/zblat3: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./ctest/xzcblat3: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory
./ctest/xdcblat3: error while loading shared libraries: libgfortran.so.5: cannot open shared object file: No such file or directory

First i thought the program is dynamic linked and qemu can't find the libgfortran.so.5 which path is /opt/riscv/sysroot/lib/libgfortran.so.5,so I add/opt/riscv/sysroot/libinto LD_LIBRARY_PATH(but here is no such step in the CI test), After this, the script runs without errors, but no output results for tests. Maybe there are still some errors. Then I try to execute the command run_test test cblat1 &manually:

qemu-riscv64 -cpu rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=128,elen=64 test/cblat1

got an error:

Segmentation fault (core dumped)

Here are the backtrace of qemu:

Thread 1 "qemu-riscv64" received signal SIGSEGV, Segmentation fault.
0x00007fffe82a6c5a in code_gen_buffer ()
(gdb) bt
#0  0x00007fffe82a6c5a in code_gen_buffer ()
#1  0x000055555565850c in cpu_tb_exec (tb_exit=<synthetic pointer>, 
    itb=<optimized out>, cpu=0x5555558542c0)
    at ../qemu/accel/tcg/cpu-exec.c:458
#2  cpu_loop_exec_tb (tb_exit=<synthetic pointer>, 
    last_tb=<synthetic pointer>, pc=<optimized out>, 
    tb=<optimized out>, cpu=0x5555558542c0)
    at ../qemu/accel/tcg/cpu-exec.c:920
#3  cpu_exec_loop (cpu=cpu@entry=0x5555558542c0, sc=<optimized out>)
    at ../qemu/accel/tcg/cpu-exec.c:1041
#4  0x0000555555658b89 in cpu_exec_setjmp (
    cpu=cpu@entry=0x5555558542c0, sc=<optimized out>)
    at ../qemu/accel/tcg/cpu-exec.c:1058
#5  0x0000555555658bea in cpu_exec (cpu=cpu@entry=0x5555558542c0)
    at ../qemu/accel/tcg/cpu-exec.c:1084
#6  0x00005555555a62c8 in cpu_loop (env=0x555555856a80)
    at ../qemu/linux-user/riscv/cpu_loop.c:37
#7  0x000055555559c6ca in main (argc=<optimized out>, 
    argv=<optimized out>, envp=<optimized out>)
    at ../qemu/linux-user/main.c:1014

I also tried gdb remote gdb:

Reading symbols from /home/work/OpenBLAS/test/cblat1...
(No debugging symbols found in /home/work/OpenBLAS/test/cblat1)
(gdb) target remote:1234
Remote debugging using :1234
Reading symbols from /home/work/OpenBLAS/opt/riscv/sysroot/lib/ld-linux-riscv64-lp64d.so.1...
(No debugging symbols found in /home/work/OpenBLAS/opt/riscv/sysroot/lib/ld-linux-riscv64-lp64d.so.1)
BFD: warning: system-supplied DSO at 0x2aaaab2ce000 has a corrupt string table index
0x00002aaaab2acda0 in _dl_call_fini ()
   from /home/work/OpenBLAS/opt/riscv/sysroot/lib/ld-linux-riscv64-lp64d.so.1
(gdb) b main
Breakpoint 1 at 0x11a14
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00002aaaab4a750a in ?? ()
(gdb) bt
#0  0x00002aaaab4a750a in ?? ()

And I tried ro run program in a real riscv environment(a board), it runs correctly, so the reason should't be the program. I'm not sure how to debug in this situation.

Thanks for your help.

Upvotes: 0

Views: 63

Answers (0)

Related Questions