Reputation: 11
I am trying to parse a std::string
that might contain Chinese characters. For example for a string contains
哈囉hi你好hello
I want to separate them into 6 strings:哈
, 囉
, hi
, 你
, 好
, hello
. Right now the string is obtained by using getline()
from a text file. Referencing this post How to use boost::spirit to parse UTF-8?, here's my current code:
#include <boost/regex/pending/unicode_iterator.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/range.hpp>
#include <iterator>
#include <iostream>
#include <ostream>
#include <cstdint>
#include <string>
using namespace boost;
using namespace std;
using namespace std::string_literals;
int main()
{
string str = u8"哈囉hi你好hello"; //actually got from getline()
auto &&utf8_text = str;
u8_to_u32_iterator<const char*>
tbegin(begin(utf8_text)), tend(end(utf8_text));
vector<uint32_t> result;
spirit::qi::parse(tbegin, tend, *spirit::standard_wide::char_, result);
for(auto &&code_point : result) {
cout << code_point << ";";
}
}
But got the error: call to 'begin' and 'end' is ambiguous.
It works when I directly declare auto &&utf8_text = u8"哈囉hi你好hello"
, but I cannot write in this way because the content of string is determined by getline()
.
I also tried this:
auto str = u8"你好,世界!";
auto &&utf8_text = str;
but still got error: no matching function for call to 'begin' and 'end'.
Upvotes: 1
Views: 1243
Reputation: 11434
auto
with string literals results in a char pointer. If you want std::string
, you have to write it out.
Upvotes: 1