Rella
Rella

Reputation: 66945

Invisiable SIGSEGV on linux that does not happen on windows?

INTRO

I have a TCP/HTTP server that supports plugins in form of Shared Libraries (DLL and .so). It has make and .sln files build system via premake. When I start my application I feed to it a configuration file like this with description of what libraries server shall use as plugins and what arguments it shall pass to tham. For some time I had 2 plugins and all worked just fine. and even now works just fine if I feed to my server config fdiles alike this. But Now I have new plugin I am developing and so new config file.

SETUP

Steps required to setup my server on linux are fiew and simple


But we need debug wersion so after we call script we

PROBLEM

So we call it. and this is what we see:

 gdb ./CloudServer

GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/ole_jak/cloud_server/CloudServer/projects/linux-gmake/bin/debug/CloudServer...done.
(gdb) r
Starting program: /home/ole_jak/cloud_server/CloudServer/projects/linux-gmake/bin/debug/CloudServer
[Thread debugging using libthread_db enabled]
Cloud Server v0.5
Copyright (c) 2011 Cloud Forever. All rights reserved.

Type 'help' to see help messages.
Config file path: config.xml
[New Thread 0x7ffff5967700 (LWP 11516)]
[New Thread 0x7ffff5166700 (LWP 11517)]
[New Thread 0x7ffff4965700 (LWP 11518)]
[New Thread 0x7ffff4164700 (LWP 11519)]
[New Thread 0x7ffff3963700 (LWP 11520)]
[New Thread 0x7ffff3162700 (LWP 11521)]
[New Thread 0x7ffff2961700 (LWP 11522)]
[New Thread 0x7ffff2160700 (LWP 11523)]
[New Thread 0x7ffff195f700 (LWP 11524)]
[New Thread 0x7ffff115e700 (LWP 11525)]
[New Thread 0x7ffff095d700 (LWP 11526)]
[New Thread 0x7fffebfff700 (LWP 11527)]
[New Thread 0x7fffeb7fe700 (LWP 11528)]
[New Thread 0x7fffeaffd700 (LWP 11529)]
[New Thread 0x7fffea7fc700 (LWP 11530)]
[New Thread 0x7fffe9ffb700 (LWP 11531)]
Library libFileService.so opened.
[New Thread 0x7fffe953c700 (LWP 11532)]
Library libUsersFilesService.so opened.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) x/i $pc
0x0:    Cannot access memory at address 0x0

I am Linux nube and all I know about Segmentation fault I know from wikipedia, but I know one more thing about my server and this new service I am creating - it compiles and runs on Windows with no errors at all (VS2008, 2010 solutions can be created from same premake script).

So I wonder how and where in this 2 files .cpp and .h I have created an error that does not show on windows at alss an shows so dramaticvally on Linux? And is it fixable, or visiable to fresh eye?

UPDATE: Valgrind output

ole_jak@dspproc:~/cloud_server/CloudServer/projects/linux-gmake/bin/debug$ valgrind ./CloudServer
==11682== Memcheck, a memory error detector
==11682== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==11682== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==11682== Command: ./CloudServer
==11682==
Cloud Server v0.5
Copyright (c) 2011 Cloud Forever. All rights reserved.

Type 'help' to see help messages.
Config file path: config.xml
Library libFileService.so opened.
Library libUsersFilesService.so opened.
==11682== Jump to the invalid address stated on the next line
==11682==    at 0x0: ???
==11682==    by 0x4D49BE: sqlite3_free (sqlite3.c:18155)
==11682==    by 0x102242D5: sqlite3OsInit (sqlite3.c:14162)
==11682==    by 0x1029EB28: sqlite3_initialize (sqlite3.c:107299)
==11682==    by 0x102A159F: openDatabase (sqlite3.c:108909)
==11682==    by 0x102A1B29: sqlite3_open (sqlite3.c:109156)
==11682==    by 0x1021CAB0: sqlite3pp::database::connect(char const*) (sqlite3pp.cpp:89)
==11682==    by 0x1021C6E3: sqlite3pp::database::database(char const*) (sqlite3pp.cpp:74)
==11682==    by 0x1020DDDF: users_files_service::create_files_table(std::string) (users_files_service.cpp:171)
==11682==    by 0x1020BAFC: users_files_service::apply_config(boost::shared_ptr<boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> > >) (users_files_service.cpp:38)
==11682==    by 0x4B5432: server_utils::parse_config_services(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:156)
==11682==    by 0x4B6436: server_utils::parse_config(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:208)
==11682==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==11682==
==11682==
==11682== Process terminating with default action of signal 11 (SIGSEGV)
==11682==  Bad permissions for mapped region at address 0x0
==11682==    at 0x0: ???
==11682==    by 0x4D49BE: sqlite3_free (sqlite3.c:18155)
==11682==    by 0x102242D5: sqlite3OsInit (sqlite3.c:14162)
==11682==    by 0x1029EB28: sqlite3_initialize (sqlite3.c:107299)
==11682==    by 0x102A159F: openDatabase (sqlite3.c:108909)
==11682==    by 0x102A1B29: sqlite3_open (sqlite3.c:109156)
==11682==    by 0x1021CAB0: sqlite3pp::database::connect(char const*) (sqlite3pp.cpp:89)
==11682==    by 0x1021C6E3: sqlite3pp::database::database(char const*) (sqlite3pp.cpp:74)
==11682==    by 0x1020DDDF: users_files_service::create_files_table(std::string) (users_files_service.cpp:171)
==11682==    by 0x1020BAFC: users_files_service::apply_config(boost::shared_ptr<boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> > >) (users_files_service.cpp:38)
==11682==    by 0x4B5432: server_utils::parse_config_services(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:156)
==11682==    by 0x4B6436: server_utils::parse_config(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:208)
==11682==
==11682== HEAP SUMMARY:
==11682==     in use at exit: 124,050 bytes in 1,083 blocks
==11682==   total heap usage: 1,814 allocs, 731 frees, 183,517 bytes allocated
==11682==
==11682== LEAK SUMMARY:
==11682==    definitely lost: 0 bytes in 0 blocks
==11682==    indirectly lost: 0 bytes in 0 blocks
==11682==      possibly lost: 46,248 bytes in 799 blocks
==11682==    still reachable: 77,802 bytes in 284 blocks
==11682==         suppressed: 0 bytes in 0 blocks
==11682== Rerun with --leak-check=full to see details of leaked memory
==11682==
==11682== For counts of detected and suppressed errors, rerun with: -v
==11682== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)
Убито
ole_jak@dspproc:~/cloud_server/CloudServer/projects/linux-gmake/bin/debug$

Upvotes: 1

Views: 755

Answers (2)

thkala
thkala

Reputation: 86353

This is a nasty one. I am unsure about the exact root cause, but this seems to be a multi-threading related issue. The immediate cause of the problem is that the sqlite3Config.m.xSize function pointer is NULL at the place and time the error happens.

This pointer is supposed to be initialized to point to a proper function the first time that sqlite3_initialize() is called, which normally happens the first time you open an SQLite database file. By setting breakpoints and watchpoints in GDB I was able to verify that the pointer is successfully set, yet at the time of the segmentation fault its value is NULL.

That could mean one of two things:

  • The new pointer value is not properly propagated to all threads. SQLite3 is supposed to be thread-safe, but well, threads can be nasty little buggers...

  • Something resets the pointer after it has been initialized. I considered this highly unlikely since the sqlite3Config structure is not usually modified after initialization.

I performed a simple test, which incidentally can be used as a temporary workaround: I added an explicit call to sqite3_initialize() as the first statement in main(), allowing it to be executed before any threads are launched. As a result, the segmentation fault went away and I got a shell prompt for your server, which points to the first of the two alternatives. Note that this is a workaround at best, since sqite3_initialize() is not supposed to be explicitly called. The root cause of the issue may still be present and make itself known otherwise - or, worse, it could break things in subtle, yet hard to detect, ways.

Since SQLite3 is supposed to be thread-safe (and the source code of the sqlite3_initialize() function seems correct in that regard), I am unsure what is happening. It could be a problem with the sqlite3pp wrapper or with the way the threads are launched.

Upvotes: 2

h4ck3rm1k3
h4ck3rm1k3

Reputation: 2100

here are my suggestions.

  1. turn off optimizations. Sometime optimizations cause errors. use -O0 for example.
  2. remove dynamic loading, try linking your code in statically, and see if the problem still occurs.
  3. reduce the size of the problem. Make the smallest possible program that can reproduce the error and then post it here.

thanks, mike

Upvotes: 0

Related Questions