Yongwei Xing
Yongwei Xing

Reputation: 13451

Different char type in windows programming

Recently, I meet some tasks about the char/string on windows platform. I see that they are different char type like char, TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR. Can someone give me some information about it? And how to use like the regular char and char *. I cam confused about these types?

Best Regards,

Upvotes: 3

Views: 3394

Answers (4)

Alexandru
Alexandru

Reputation: 12902

Let me try to shed some light (I've blogged this on my site at https://www.dima.to/blog/?p=190 in case you want to check it out):

#include "stdafx.h"
#include "Windows.h"

int _tmain(int argc, _TCHAR* argv[])
{
    /* Quick Tutorial on Strings in Microsoft Visual C++

       The Unicode Character Set and Multibyte Character Set options in MSVC++ provide a project with two flavours of string encodings. They will use different encodings for characters in your project. Here are the two main character types in MSVC++ that you should be concerned about:

       1. char <-- char characters use an 8-bit character encoding (8 bits = 1 byte) according to MSDN.
       2. wchar_t <-- wchar_t uses a 16-bit character encoding (16 bits = 2 bytes) according to MSDN.

       From above, we can see that the size of each character in our strings will change depending on our chosen character set.

       WARNING: Do NOT assume that any given character you append to either a Mutlibyte or Unicode string will always take up a single-byte or double-byte space defined by char or wchar_t! That is up to the discretion of the encoding used. Sometimes, characters need to be combined to define a character that the user wants in their string. In other words, take this example: Multibyte character strings take up a byte per character inside of the string, but that does not mean that a given byte will always produce the character you desire at a particular location, because even multibyte characters may take up more than a single byte. MSDN says it may take up TWO character spaces to produce a single multibyte-encoded character: "A multibyte-character string may contain a mixture of single-byte and double-byte characters. A two-byte multibyte character has a lead byte and a trail byte."

       WARNING: Do NOT assume that Unicode contains every character for every language. For more information, please see http://stackoverflow.com/questions/5290182/how-many-bytes-takes-one-unicode-character.

       Note: The ASCII Character Set is a subset of both Multibyte and Unicode Character Sets (in other words, both of these flavours encompass ASCII characters).
       Note: You should always use Unicode for new development, according to MSDN. For more information, please see http://msdn.microsoft.com/en-us/library/ey142t48.aspx.
    */
    // Strings that are Multibyte.
    LPSTR a; // Regular Multibyte string (synonymous with char *).
    LPCSTR b; // Constant Multibyte string (synonymous with const char *).
    // Strings that are Unicode.
    LPWSTR c; // Regular Unicode string (synonymous with wchar_t *).
    LPCWSTR d; // Constant Unicode string (synonymous with const wchar_t *).
    // Strings that take on either Multibyte or Unicode depending on project settings.
    LPTSTR e; // Multibyte or Unicode string (can be either char * or wchar_t *).
    LPCTSTR f; // Constant Multibyte or Unicode string (can be either const char * or const wchar_t *).
    /* From above, it is safe to assume that the pattern is as follows:

       LP: Specifies a long pointer type (this is synonymous with prefixing this type with a *).
       W: Specifies that the type is of the Unicode Character Set.
       C: Specifies that the type is constant.
       T: Specifies that the type has a variable encoding.
       STR: Specifies that the type is a string type.
    */
    // String format specifiers:
    e = _T("Example."); // Formats a string as either Multibyte or Unicode depending on project settings.
    e = TEXT("Example."); // Formats a string as either Multibyte or Unicode depending on project settings (same as _T).
    c = L"Example."; // Formats a string as Unicode.
    a = "Example."; // Formats a string as Multibyte.
    return 0;
}

Upvotes: 0

In silico
In silico

Reputation: 52179

They are documented on MSDN. Here's a few:

  • TCHAR: A WCHAR if UNICODE is defined, a CHAR otherwise.
  • WCHAR: A 16-bit Unicode character.
  • CHAR: An 8-bit Windows (ANSI) character.
  • LPTSTR: An LPWSTR if UNICODE is defined, an LPSTR otherwise.
  • LPSTR: A pointer to a null-terminated string of 8-bit Windows (ANSI) characters.
  • LPWSTR: A pointer to a null-terminated string of 16-bit Unicode characters.
  • LPCTSTR: An LPCWSTR if UNICODE is defined, an LPCSTR otherwise.
  • LPCWSTR: A pointer to a constant null-terminated string of 16-bit Unicode characters.
  • LPCSTR: A pointer to a constant null-terminated string of 8-bit Windows (ANSI) characters.

Note that some of these types map to something different depending on whether UNICODE has been #define'd. By default, they resolve to the ANSI versions:

#include <windows.h>
// LPCTSTR resolves to LPCSTR

When you #define UNICODE before #include <windows.h>, they resolve to the Unicode versions.

#define UNICODE
#include <windows.h>
// LPCTSTR resolves to LPCWSTR

They are in reality typedefs to some fundamental types in the C and C++ language. For example:

typedef char CHAR;
typedef wchar_t WCHAR;

On compilers like Visual C++, there's really no difference between an LPCSTR and a const char* or a LPCWSTR and a const wchar_t* . This might differ between compilers however, which is why these data types exist in the first place!

It's sort of like the Windows API equivalent of <cstdint> or <stdint.h>. The Windows API has bindings in other languages, and having data types with a known size is useful, if not required.

Upvotes: 14

Emil H
Emil H

Reputation: 40230

TCHAR, LPTSTR and LPCTSTR are all generalized macros that will be either regular character strings or wide character strings depending on whether or not the UNICODE define is set. CHAR, LPSTR and LPCSTR are regular character strings. WCHAR, LPWSTR and LPCWSTR are wide character strings. TCHAR, CHAR and WCHAR represents a single character. LPTSTR, LPSTR and LPWSTR are "Long Pointer to STRing". LPCTSTR, LPCSTR and LPWCSTR are constant string pointers.

Upvotes: 0

Mark Ransom
Mark Ransom

Reputation: 308392

char is the standard 8-bit character type.

wchar_t is a 16-bit Unicode UTF-16 character type, used since about Windows 95. WCHAR is another name for it.

TCHAR can be either one, depending on your compiler settings. Most of the time in a modern program it's wchar_t.

The P and LP prefixes are pointers to the different types. The L is legacy (stands for Long pointer), and became obsolete with Windows 95; you still see it quite a bit though.

The C after the prefix stands for const.

Upvotes: 2

Related Questions