Brandon Nece
Brandon Nece

Reputation: 41

How to read a character not included in ascii in c++?

I'm going through a folder of files editing the titles. I am trying to remove a certain piece of the title but the bracket used to separate in the title is not a standard ascii so I can't figure a way of removing it. This is a sample of the title: 【Remove this portion】keep this portion. I've included the coding I'm using. I'm using a cstring to store the title and then using cstring::find() to locate the portion but it is unable to locate that type of bracket.

    //sets definition
    HANDLE hfind;
    WIN32_FIND_DATA data;

    //creates string for to search for a specific file
    CString FileFormat = FolderPath + Format;
    CString NewTitle, PulledFile;

    //sets definition for retrieving first file
    hfind = FindFirstFile(FileFormat, &data);

    //runs loop if handle is good
    if (hfind != INVALID_HANDLE_VALUE)
    {
    //loops until it hits the end of the folder
    do {
        //adds filename to vector
        PulledFile = data.cFileName;
        if(PulledFile.Find(L'【') != -1)
        {
            while (PulledFile.Find(L'】') != -1)
            {
                PulledFile = PulledFile.Right(PulledFile.GetLength() - 1);
            }
        }
        NewTitle = PulledFile.Left(PulledFile.GetLength()-(Format.GetLength() + 9));
        if (sizeof(NewTitle) != NULL)
        {
            v.push_back(NewTitle);
        }
    } while (FindNextFile(hfind, &data));
    }

Upvotes: 3

Views: 181

Answers (2)

meneldal
meneldal

Reputation: 1747

The most likely issue you're facing is that you are not compiling correctly. According to the CString documentation:

A CStringW object contains thewchar_t type and supports Unicode strings. A CStringA object contains the char type, and supports single-byte and multi-byte (MBCS) strings. A CString object supports either the char type or the wchar_t type, depending on whether the MBCS symbol or the UNICODE symbol is defined at compile time.

The actual underlying type depends on your compilation parameters. What is most likely happening is that it's trying to compare a Unicode string with your MBCS string literal value and doesn't return anything.

If you want to fix this you should decide if you want to use Unicode or MBCS and update your compilation parameters accordingly, defining either MBCS or UNICODE.

If you use Unicode, you will have to change your string literal because it currently works for MBCS. You can either use the codepoint L'\u3010' which will return the good character or make sure your file is using a Unicode encoding and use u'【'.

Upvotes: 2

selbie
selbie

Reputation: 104569

Most likely your editor isn't properly encoding the hardcoded 【 and 】 as the unicode chars you seek. Visual Studio sometimes gets this right with auto-encoding the source file as UTF8, but that's not always reliable and may not survive a source control system that expects ascii.

Easiest thing to do is use the \uNNNN syntax to match the chars.

    if(PulledFile.Find(L'\u3010') != -1)
    {
        while (PulledFile.Find(L'\u3011') != -1)
        {
            PulledFile = PulledFile.Right(PulledFile.GetLength() - 1);
        }
    }

Where \u3010 and \u3011 are the hex escape sequences for the unicode values of【 and 】respectively.

Upvotes: 2

Related Questions