Reputation: 1305
I have a utf16 encoded string, I want to convert it to float
For Example
If have a utf16 string like u"1342.223"
it should return 1342.223
in floats, if it was utf8 i used to convert it using stod
function, but how to do this job on utf16 enocoded string std::u16string
Upvotes: 2
Views: 888
Reputation: 48635
There is no standard function for this. If you can use std::wstring
on a system that happens to use 16bit
wide characters, you could use:
double d;
std::wistringstream(L"1342.223") >> d;
Otherwise you could take advantage of the simple conversion of numeric digits from UTF-16
to ASCII/UTF-8
to write a fast conversion function. It is not ideal but should be reasonably efficient:
double u16stod(std::u16string const& u16s)
{
char buf[std::numeric_limits<double>::max_digits10 + 1];
std::transform(std::begin(u16s), std::end(u16s), buf,
[](char16_t c){ return char(c); });
buf[u16s.size()] = '\0'; // terminator
// some error checking here?
return std::strtod(buf, NULL);
}
Upvotes: 1
Reputation: 149085
First, conversion of an utf16 numeric character string to a narrow character string is trivial. Even if you cannot be sure that the narrow character set is ASCII for 7 bits characters, C guarantees that code '0' to '9' shall be consecutive, and it is also true for Unicode (0x30 to 0x39). So code can be as simple as (only depends on <string>
inclusion:
double u16strtod(const std::u16string& u16) {
char *beg = new char[u16.size() + 1];
char *str = beg;
for (char16_t uc: u16) {
if (uc == u' ') *str++ = ' '; // special processing for possible . and space
else if (uc == u'.') *str++ = '.';
else if ((uc < u'0') || (uc > u'9')) break; // could use better error processing
else {
*str++ = '0' + (uc - u'0');
}
}
*str++ = '\0';
char *end;
double d = strtod(beg, &end); // could use better error processing
delete[] beg;
return d;
}
It is even simpler if narrow charset is ASCII:
double u16strtod(const std::u16string& u16) {
char *beg = new char[u16.size() + 1];
char *str = beg;
for (char16_t uc: u16) {
if ((uc <= 0) || (uc >= 127)) break; // can only contain ASCII characters
else {
*str++ = uc; // and the unicode code IS the ASCII code
}
}
*str++ = '\0';
char *end;
double d = strtod(beg, &end);
delete[] beg;
return d;
}
Upvotes: 1
Reputation: 131976
If you know for a fact that your string is nicely-formatted (e.g. no spaces), and if and only if performance is critical (i.e. if you're parsing millions or billions of numbers), don't dismiss the possibility of just decoding it yourself, looping over the string. Look for the standard library source code (perhaps compare libc++ and libstdc++) to see what they do, and adapt it. Of course, in these cases, you should also take care to parallelize your work, try to exploit SIMD and so on.
Upvotes: 0