Reputation: 23
I am trying to read from a file using the Windows function ReadFile()
, but when I print the message it prints too many characters.
It doesn't matter if I read from an ANSII file or UNICODE file, I don't get the right characters.
Text in file is : "This is a text file".
Screen shot for the ANSII file:
Screen shot for the UNICODE file:
What Am I doing wrong?
#define BUFSIZE 4000
int _tmain(int argc, TCHAR *argv[])
{
HANDLE hIn;
TCHAR buffer[BUFSIZE];
DWORD nIn = 0;
//create file
hIn = CreateFile(argv[1],
GENERIC_READ,
FILE_SHARE_READ,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
//check the handle
if (hIn == INVALID_HANDLE_VALUE)
{
printf("\nOpen file error\n");
}
//read from file
if (FALSE == ReadFile(hIn, buffer, BUFSIZE - 1, &nIn, NULL))
{
printf("Terminal failure: Unable to read from file.\n GetLastError=%08x\n", GetLastError());
CloseHandle(hIn);
return 0;
}
if (nIn > 0 && nIn <= BUFSIZE - 1)
{
buffer[nIn] = TEXT('\0'); // NULL character
_tprintf(TEXT("Data read from %s (%d bytes): \n"), argv[1], nIn);
}
else if (nIn == 0)
{
_tprintf(TEXT("No data read from file %s\n"), argv[1]);
}
else
{
printf("\n ** Unexpected value for nIn ** \n");
}
printf("1:%s\n", buffer);
_tprintf(TEXT("\n2:%s"), buffer);
return 0;
}
Upvotes: 2
Views: 8486
Reputation: 17573
The Windows API function ReadFile()
reads bytes, an unsigned char
, and not the Windows UNICODE sized TCHAR
which in modern Windows is a two byte and not a one byte as in Windows 95, etc. So you need to make the following modifications.
See also What is the difference between _tmain() and main() in C++? which has some additional information about the different compilation targets for Windows and the character encodings used.
First of all your buffer should be a BYTE
type and not a TCHAR
.
Secondly you need to make sure that it is zero filed so initialize the buffer as in BYTE buffer[BUFSIZE] = {0};
.
Since Windows UNICODE is UTF-16 or two bytes per character you need to make sure that the end of string character for a UNICODE text string is two bytes of binary zero and you need to take this into account for your buffer length. When placing your end of string you need to make sure that it is two bytes of zero and not just one.
You should read BUFSIZE - 2
bytes to make sure that you read an even number of bytes in case it is a UNICODE string you are reading. And your buffer size should be a multiple of two as well which it is.
If the string is an ANSI string that you read in then when displayed as UNICODE it will probably look like garbage because each UNICODE character will be composed of two ANSI characters.
So to make the strings the same you will need to translate between the two character encodings. See this article about Using Byte Order Marks in text files to indicate the kind of character encoding being used in the file.
Upvotes: 1