Benjamin
Benjamin

Reputation: 10393

Count of bytes vs Count of Chars

Some apis requires count of chars.

// Why did they choose cch in these functions.
HRESULT StringCchCopyW(
  __out  LPWSTR pszDest,
  __in   size_t cchDest,
  __in   LPCWSTR pszSrc
);

errno_t wcscpy_s(
   wchar_t *strDestination,
   size_t numberOfElements,
   const wchar_t *strSource 
);

DWORD WINAPI GetCurrentDirectoryW(
  __in   DWORD nBufferLength, // Count of Chars
  __out  LPWSTR lpBuffer
);  

And Some apis requires count of bytes.

// What do you prefer cch vs cb function.
// Do cch functions almost useful?
HRESULT StringCbCopyW(
  __out  LPWSTR pszDest,
  __in   size_t cbDest,
  __in   LPCWSTR pszSrc
);

BOOL WINAPI ReadFile(
  __in         HANDLE hFile,
  __out        LPVOID lpBuffer,
  __in         DWORD nNumberOfBytesToRead,
  __out_opt    LPDWORD lpNumberOfBytesRead,
  __inout_opt  LPOVERLAPPED lpOverlapped
);

// Why did they choose cb in these structures.
// Because there are some apis uses cb, I always should see MSDN.
typedef struct _LSA_UNICODE_STRING {
  USHORT Length; // Count of bytes.
  USHORT MaximumLength; // Count of bytes.
  PWSTR  Buffer;
} UNICODE_STRING, *PUNICODE_STRING;

typedef struct _FILE_RENAME_INFO {
  BOOL   ReplaceIfExists;
  HANDLE RootDirectory;
  DWORD  FileNameLength; // Count of bytes.
  WCHAR  FileName[1];
} FILE_RENAME_INFO, *PFILE_RENAME_INFO;

When you design a function or a data structure, how do you determine cb or cch? And why?
To design better api for caller, what should I know about this?

Upvotes: 0

Views: 2681

Answers (2)

Mac
Mac

Reputation: 14791

If you notice, the first group of functions you mention are all ASCII functions, and so in that case there is no difference - the count of bytes is the count of characters. That is because (generally, anyway) a single ASCII character is exactly one byte in size.

The second group are unicode functions/structs. In this case, the characters are not guaranteed to be only a single byte in size - if in UTF16 format they'll be two bytes wide, in UTF32 they'll be four, and in UTF8 they'll (typically) be anywhere from one to four bytes wide.

Particularly with the case of UTF8 data, if you create a buffer usually you set aside a certain number of bytes, which depending on character sizes could be quite a variety of lengths in terms of character counts. I'm not overly familiar with most of the functions/structs you've presented, but it wouldn't surprise me if that has something to do with it.

To answer your question, if you're working with ASCII you can use either approach - it makes no difference. If working with variable-length encodings however (such as UTF8), whether you use one or the other depends on whether you are interested in just the characters involved, or whether you also need to take into account their encoding.

Upvotes: 0

user541686
user541686

Reputation: 210445

If the data returned is a string, you should return the count of chars, since the number of bytes is often useless. But if it's generic binary data (and not specifically a string), then obviously the number of chars doesn't make any sense, so use the number of bytes.

As to why:

I believe the reason for LSA_UNICODE_STRING holding the number of bytes is that it's meant to be compatible with UNICODE_STRING, which in turn is used in NtCreateFile. But NtCreateFile takes in a FILE_OPEN_BY_FILE_ID parameter that actually treats the UNICODE_STRING to be pointing to a LONGLONG value, and not a string... so the number of bytes made more sense there, although I'd say it was overall a poor design:

FILE_OPEN_BY_FILE_ID: The file name that is specified by the ObjectAttributes parameter includes the 8-byte file reference number for the file.

Upvotes: 3

Related Questions