Reputation: 27324
I wrote a small test program (see the end of this post) that uses libicu's uidna_IDNToASCII
function to punycode a Unicode domain name.
$ g++ -std=c++11 -W -Wall test.cpp -licucore
$ ./a.out EXAMΠLE.com
xn--examle-s0e.com
$ ./a.out EXAMPLE.com
EXAMPLE.com
Punycoder.com confirms that xn--examle-s0e.com
is the punycode for examπle.com
(with the Greek and the ASCII both lowercased). But when I give my program the pure-ASCII EXAMPLE.com
, libicu fails to lowercase any of it!
How can I convince libicu to lowercase pure-ASCII domain names too?
Here's the complete C++11 source code I'm using:
#include <cstdio>
#include <string>
#include <unicode/uidna.h>
#include <unicode/ustring.h>
#include <vector>
std::string convert_utf8_to_idna(const std::string& input) {
UErrorCode err = U_ZERO_ERROR;
std::int32_t needed = 0;
auto src = std::vector<UChar>(1000);
(void)u_strFromUTF8WithSub(
src.data(), src.size(), &needed,
input.data(), input.size(),
0xFFFD, nullptr, &err
);
src.resize(needed); // chop off the unused excess
assert(err == U_ZERO_ERROR);
auto dest = std::vector<UChar>(1000);
needed = uidna_IDNToASCII(
src.data(), src.size(),
dest.data(), dest.size(),
UIDNA_ALLOW_UNASSIGNED, nullptr, &err
);
assert(err == U_ZERO_ERROR);
dest.resize(needed); // chop off the unused excess
return std::string(dest.begin(), dest.end());
}
int main(int argc, char **argv) {
std::string input = (argc >= 2) ? argv[1] : "example.com";
std::string output = convert_utf8_to_idna(input);
printf("%s\n", output.c_str());
}
Upvotes: 0
Views: 75