Reputation: 287
I have a problem when I try to parse a xml file containing a specific Kanji:
退
After debugging, I see that the problem is in this function of RapidXml :
struct text_pure_no_ws_pred
{
static unsigned char test(Ch ch)
{
return internal::lookup_tables<0>::lookup_text_pure_no_ws[static_cast<unsigned char>(ch)];
}
};
const unsigned char lookup_tables<Dummy>::lookup_text_pure_no_ws[256] =
{
// 0 1 2 3 4 5 6 7 8 9 A B C D E F
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 1
1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 2
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, // 3
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 4
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 5
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 6
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 7
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 8
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 9
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // A
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // B
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // C
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // D
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // E
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 // F
};
where ch is the kanji 退. This function returns false. Why? With all the others characters, it returns true. Do you have any idea?
Upvotes: 0
Views: 505
Reputation: 15175
RapidXML does not support full Unicode only UTF-8.
http://rapidxml.sourceforge.net/manual.html#namespacerapidxml_1character_types_and_encodings
See: Rapidxml and UTF8
The only options you have are: Convert the Kanji to UTF-8 and hope it works. Convert to non Unicode code-page and hope that that works with RapidXML.
Upvotes: 1
Reputation: 8065
It looks like Ch contains a Unicode value. static_cast<unsigned char>(0x9000)
is 0.
You need a table that holds a lot more than 256 values.
Upvotes: 1