Reputation: 3727
I'm debugging an "Access violation" exception on a large application in C++ (Visual Studio 2015). The application is built from several libraries and the problem occurs on one of them (SystemC), although I suspect the source of the problem is elsewhere.
m_update_phase = true;
m_prim_channel_registry->perform_update();
m_update_phase = false;
inline
void
sc_prim_channel_registry::perform_update()
{
for( int i = m_update_last; i >= 0; -- i ) {
m_update_array[i]->perform_update();
}
m_update_last = -1;
}
(These are excerpts from systemc\kernel\sc_simcontext.cpp
and systemc\communication\sc_prim_channel.h
, part of the SystemC library)
The error happens after several iterations through this code above. The call to m_prim_channel_registry->perform_update()
throws 0xC0000005: Access violation writing location 0x0F4CD9E9.
exception.
This happens only when building the application in Release configuration.
Looking at the assembly code, I see that that the function sc_prim_channel_registry::perform_update()
was inlined, and the inner function call m_update_array[i]->perform_update()
seems to corrupt the stack frame of the calling function.
When the m_update_last = -1;
is executed, &m_update_last no longer points to a valid memory location and the exception is thrown.
(m_update_last
is a simple native member of class sc_prim_channel_registry
with type int
)
m_update_phase = true;
m_prim_channel_registry->perform_update();
1034D99E mov eax,dword ptr [esi+10h]
1034D9A1 mov byte ptr [esi+0A3h],1
1034D9A8 mov dword ptr [ebp-18h],eax
1034D9AB mov ebx,dword ptr [eax+28h]
1034D9AE test ebx,ebx
1034D9B0 js $LN163+0FEh (1034D9D0h)
1034D9B2 mov esi,eax
1034D9B4 mov eax,dword ptr [esi+20h]
1034D9B7 mov edi,dword ptr [eax+ebx*4]
1034D9BA mov ecx,edi
1034D9BC mov eax,dword ptr [edi]
1034D9BE call dword ptr [eax+14h]
1034D9C1 sub ebx,1
1034D9C4 mov byte ptr [edi+1Ch],0
1034D9C8 jns $LN163+0E2h (1034D9B4h)
1034D9CA mov esi,dword ptr [this]
1034D9CD mov eax,dword ptr [ebp-18h]
1034D9D0 mov dword ptr [eax+28h],0FFFFFFFFh
m_update_phase = false;
The exception is thrown at address 1034D9D0
So the last instructions being executed are
0F97D9CD mov eax,dword ptr [ebp-18h]
0F97D9D0 mov dword ptr [eax+28h],0FFFFFFFFh
m_prim_channel_registry
address is in [ebp-18h] and eax, and [eax+28h] is m_update_last
.
Looking in the watch window at esp and ebp before the inner call perform_update()
, I see that:
ebp-18h 0x0022fd5c unsigned int
esp 0x0022fd60 unsigned int
This is strange. The difference between them is only 4 bytes and the next push to the stack will make them equal and overwrite [ebp-18h]!
[ebp-18h] holds a copy of this->m_prim_channel_registry
. The call 1034D9BE call dword ptr [eax+14h]
, as it pushes the stack, corrupts the contents of ebp-18h. It looks like the stack has grown (downwards) too much, and corrupts the previous frame.
I believe I found the problem, but I can't say I understand this completely.
In the same function (sc_simcontext::crunch
) where the external perform_update()
is invoked, systemc methods are invoked:
// execute method processes
sc_method_handle method_h = pop_runnable_method();
while( method_h != 0 ) {
try {
method_h->execute();
}
catch( const sc_exception& ex ) {
cout << "\n" << ex.what() << endl;
m_error = true;
return;
}
method_h = pop_runnable_method();
}
These methods are deferred function calls registered earlier.
One of these methods was returning by executing ret 4
thus shrinking the stack frame every time it was called, to the point where the corruption described above happened.
And how did I manage registering a corrupted systemc method?
Apparently it's a bad idea using SC_METHOD(f)
when f is a virtual function of the module. Doing that caused a different, unrelated "random" function to be called.
I'm not exactly sure why it happens this way and why this limitation exists. Also I don't remember seeing any warning about using virtual member functions as systemc methods, however it was definitely the problem. When debugging the method registration in the SC_METHOD call itself I could see the function pointer inside pointing to a different function than was given to the SC_METHOD macro.
To fix the problem I called SC_METHOD(wrapper_f)
, where wrapper_f
is a simple non virtual member function of the module, that all it does is calling f
, the original virtual function. That's it.
Upvotes: 5
Views: 1347
Reputation:
You are probably having issues with member function pointers on MSVC.
Consider following code, file main.cpp:
#include <cstdio>
struct base;
typedef void (base::*baseptr_t)();
struct base {
void foo() { }
};
void callfoo(base *obj, baseptr_t ptr);
int main()
{
base obj;
std::printf("sizeof(baseptr_t)=%llu\n", sizeof(baseptr_t));
callfoo(&obj, &base::foo);
}
and file callfoo.cpp:
#include <cstdio>
struct base;
typedef void (base::*baseptr_t)();
void callfoo(base *obj, baseptr_t ptr)
{
std::printf("sizeof(baseptr_t)=%llu\n", sizeof(baseptr_t));
(obj->*ptr)();
}
On x86_64 this prints:
sizeof(baseptr_t)=8
sizeof(baseptr_t)=24
before crashing with access violation.
This is because MSVC generates 8-byte pointers for classes with known definition, but has to generate 24-byte pointers if class definition is not available.
Compiler has ways to control this behavior:
PS: I wasn't able to reproduce this, but you can also check sc_process.h header from SystemC, it has following lines:
#if defined(_MSC_VER)
#if ( _MSC_VER > 1200 )
# define SC_USE_MEMBER_FUNC_PTR
#endif
#else
# define SC_USE_MEMBER_FUNC_PTR
#endif
You can try to undefined this macro for your build, in this case SystemC will try to use different strategy when calling process function.
PS2: Member function pointer size can be 8, 16 and 24 bytes in size depending on its hierarchy, so there should be 3 ways to dereference member function pointer, plus each way has to handle virtual and non-virtual calls.
Upvotes: 3
Reputation: 13510
It seems you know what you are doing.
I can give you an advice, not a solution, but it is something that I encountered more than a few times, that corrupts the stack.
Check the the function causing the corruption, perform_update()
. Does it defines a big array as a local variable? If so, it probably exceeds the stack and overrides the return data and other important data there. This is the most common problem I encountered for stack corruption.
It is a sneaky problem because it depends on the size of the local array and the amount of stack you have. This changes from system to system.
Upvotes: 0