What is the difference between _tcslen and _tcsclen?

Question

I develop an application who has to be compatible with different charsets encoding. To do that, I always use TCHAR* instead of char* to define strings. Therefore I use _tcslen to get the size of my strings.

Today, I saw on the versioning system of my company that one of my workmate edited the line where I wrote _tcslen to use _tcsclen instead.

The only link I found who is talking about the particularity of this function is this one and it doesn't explain the difference between those functions.

Can someone explain me the difference between _tcslen and _tcsclen?

Roger Lipscombe · Accepted Answer

The _t prefix means that these are text handling functions (actually macros) that map to different implementations, depending on whether you're compiling for "Unicode" (actually UTF-16) or not.

When you're compiling for Unicode (_UNICODE is set), they map to the same function, wcslen, which returns the length of the string in wide (two-byte) characters.

When you're not compiling for Unicode (_MBCS is set), they map to different functions:

_tcslen maps to strlen, which returns the length of the string in bytes. This is intended so that you can allocate buffers of the correct size.
_tcsclen maps to _mbslen, the documentation for which is fairly sparse. I'm guessing, however that the c in _tcsclen is intended to mean characters.

The difference between characters and byte is that, in a multi-byte encoding, a particular character can take between one and three bytes. Thus: _tcsclen (_mbslen) tells you how many characters are in the string, which is useful for rendering, and _tcslen (strlen) tells you how many bytes are in the string, which you need for memory allocation.

In general, if you're working primarily on Windows, you'll just compile for Unicode and be done with it. You only need to deal with other character encodings if you're talking to another system (reading/writing files, network messages, etc.), and you'll usually convert to and from UTF-8.

Note that when the Windows SDK documentation refers to "multi-byte", it means older multi-byte encodings, such as Shift-JIS, rather than UTF-8 (which is also a multi-byte encoding).

What is the difference between _tcslen and _tcsclen?

Answers (2)

Related Questions