Reputation: 213809
I have a multi-threaded Windows program which is doing serial port asynchronous I/O through "raw" Win API calls. It is working perfectly fine on any Windows version except Windows 7/64.
The problem is that the program can find and setup the COM port just fine, but it cannot send nor receive any data. No matter if I compile the binary in Win XP or 7, I cannot send/receive on Win 7/64. Compatibility mode, run as admin etc does not help.
I have managed to narrow down the problem to the FileIOCompletionRoutine callback. Every time it is called, dwErrorCode is always 0, dwNumberOfBytesTransfered is always 0. GetOverlappedResult() from inside the function always return TRUE (everything ok). It seems to set the lpNumberOfBytesTransferred correctly. But the lpOverlapped parameter is corrupt, it is a garbage pointer pointing at garbage values.
I can see that it is corrupt by either checking in the debugger what address the correct OVERLAPPED struct is allocated at, or by setting a temp. global variable to point at it.
My question is: why does this happen, and why does it only happen on Windows 7/64? Is there some issue with calling convention that I am not aware of? Or is the overlapped struct treated differently somehow?
Posting relevant parts of the code below:
class ThreadedComport : public Comport
{
private:
typedef struct
{
OVERLAPPED overlapped;
ThreadedComport* caller; /* add user data to struct */
} OVERLAPPED_overlap;
OVERLAPPED_overlap _send_overlapped;
OVERLAPPED_overlap _rec_overlapped;
...
static void WINAPI _send_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped);
static void WINAPI _receive_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped);
...
};
Open/close is done in a base class that has no multi-threading nor asynchronous I/O implemented:
void Comport::open (void)
{
char port[20];
DCB dcbCommPort;
COMMTIMEOUTS ctmo_new = {0};
if(_is_open)
{
close();
}
sprintf(port, "\\\\.\\COM%d", TEXT(_port_number));
_hcom = CreateFile(port,
GENERIC_READ | GENERIC_WRITE,
0,
0,
OPEN_EXISTING,
0,
0);
if(_hcom == INVALID_HANDLE_VALUE)
{
// error handling
}
GetCommTimeouts(_hcom, &_ctmo_old);
ctmo_new.ReadTotalTimeoutConstant = 10;
ctmo_new.ReadTotalTimeoutMultiplier = 0;
ctmo_new.WriteTotalTimeoutMultiplier = 0;
ctmo_new.WriteTotalTimeoutConstant = 0;
if(SetCommTimeouts(_hcom, &ctmo_new) == FALSE)
{
// error handling
}
dcbCommPort.DCBlength = sizeof(DCB);
if(GetCommState(_hcom, &(DCB)dcbCommPort) == FALSE)
{
// error handling
}
// setup DCB, this seems to work fine
dcbCommPort.DCBlength = sizeof(DCB);
dcbCommPort.BaudRate = baudrate_int;
if(_parity == PAR_NONE)
{
dcbCommPort.fParity = 0; /* disable parity */
}
else
{
dcbCommPort.fParity = 1; /* enable parity */
}
dcbCommPort.Parity = (uint8)_parity;
dcbCommPort.ByteSize = _databits;
dcbCommPort.StopBits = _stopbits;
SetCommState(_hcom, &(DCB)dcbCommPort);
}
void Comport::close (void)
{
if(_hcom != NULL)
{
SetCommTimeouts(_hcom, &_ctmo_old);
CloseHandle(_hcom);
_hcom = NULL;
}
_is_open = false;
}
The whole multi-threading and event handling mechanism is rather complex, relevant parts are:
Send
result = WriteFileEx (_hcom, // handle to output file
(void*)_write_data, // pointer to input buffer
send_buf_size, // number of bytes to write
(LPOVERLAPPED)&_send_overlapped, // pointer to async. i/o data
(LPOVERLAPPED_COMPLETION_ROUTINE )&_send_callback);
Receive
result = ReadFileEx (_hcom, // handle to output file
(void*)_read_data, // pointer to input buffer
_MAX_MESSAGE_LENGTH, // number of bytes to read
(OVERLAPPED*)&_rec_overlapped, // pointer to async. i/o data
(LPOVERLAPPED_COMPLETION_ROUTINE )&_receive_callback);
Callback functions
void WINAPI ThreadedComport::_send_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped)
{
ThreadedComport* _this = ((OVERLAPPED_overlap*)lpOverlapped)->caller;
if(dwErrorCode == 0) // no errors
{
if(dwNumberOfBytesTransfered > 0)
{
_this->_data_sent = dwNumberOfBytesTransfered;
}
}
SetEvent(lpOverlapped->hEvent);
}
void WINAPI ThreadedComport::_receive_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped)
{
if(dwErrorCode == 0) // no errors
{
if(dwNumberOfBytesTransfered > 0)
{
ThreadedComport* _this = ((OVERLAPPED_overlap*)lpOverlapped)->caller;
_this->_bytes_read = dwNumberOfBytesTransfered;
}
}
SetEvent(lpOverlapped->hEvent);
}
EDIT
Updated: I have spent most of the day on the theory that the OVERLAPPED variable went out of scope before the callback is executed. I have verified that this never happens and I have even tried to declare the OVERLAPPED struct as static, same problem remains. If the OVERLAPPED struct had gone out of scope, I would expect the callback to point at the memory location where the struct was previously allocated, but it doesn't, it points somewhere else, at an entirely unfamiliar memory location. Why it does that, I have no idea.
Maybe Windows 7/64 makes an internal hardcopy of the OVERLAPPED struct? I can see how that would cause this behavior, since I am relying on additional parameters sneaked in at the end of the struct (which seems like a hack to me, but apparently I got that "hack" from official MSDN examples).
I have also tried to change calling convention but this doesn't work at all, if I change it then the program crashes. (The standard calling convention causes it to crash, whatever standard is, cdecl? __fastcall also causes a crash.) The calling conventions that work are __stdcall, WINAPI and CALLBACK. I think these are all same names for __stdcall and I read somewhere that Win 64 ignores that calling convention anyhow.
It would seem that the callback is executed because of some "spurious disturbance" in Win 7/64 generating false callback calls with corrupt or irrelevant parameters.
Multi-thread race conditions is another theory, but in the scenario I am running to reproduce the bug, there is only one thread, and I can confirm that the thread calling ReadFileEx is the same one that is executing the callback.
Upvotes: 2
Views: 5004
Reputation: 213809
I have found the problem, it turned out to be annoyingly simple.
In CreateFile(), I did not specify FILE_FLAG_OVERLAPPED. For reasons unknown, this was not necessary on 32-bit Windows. But if you forget it on 64-bit Windows, it will apparently still generate callbacks with the FileIOCompletionRoutine, but they have corrupted parameters.
I haven't found any documentation of this change of behavior anywhere; perhaps it was just an internal bug fix in Windows, since the older documentation also specifies that you must have FILE_FLAG_OVERLAPPED set.
As for my specific case, the bug appeared because I had a base class that assumed synchronous I/O, which has then been inherited by a class using asynchronous I/O.
Upvotes: 2