How to parse UTF-8 Chinese string

Question

I am trying to parse a std::string that might contain Chinese characters. For example for a string contains

哈囉hi你好hello

I want to separate them into 6 strings:哈, 囉, hi, 你, 好, hello. Right now the string is obtained by using getline() from a text file. Referencing this post How to use boost::spirit to parse UTF-8?, here's my current code:

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

using namespace boost;
using namespace std;
using namespace std::string_literals; 

int main()
{
    string str = u8"哈囉hi你好hello"; //actually got from getline()
    auto &&utf8_text = str;

    u8_to_u32_iterator
        tbegin(begin(utf8_text)), tend(end(utf8_text));

    vector result;
    spirit::qi::parse(tbegin, tend, *spirit::standard_wide::char_, result);
    for(auto &&code_point : result) {
        cout << code_point << ";";
    }
}

But got the error: call to 'begin' and 'end' is ambiguous. It works when I directly declare auto &&utf8_text = u8"哈囉hi你好hello", but I cannot write in this way because the content of string is determined by getline().

I also tried this:

auto str = u8"你好，世界！";
auto &&utf8_text = str;

but still got error: no matching function for call to 'begin' and 'end'.

How to parse UTF-8 Chinese string

Answers (1)

Related Questions