Quuxplusone
Quuxplusone

Reputation: 27324

ICU's IDNA/punycode API doesn't lowercase names by default?

I wrote a small test program (see the end of this post) that uses libicu's uidna_IDNToASCII function to punycode a Unicode domain name.

$ g++ -std=c++11 -W -Wall test.cpp -licucore
$ ./a.out EXAMΠLE.com
xn--examle-s0e.com
$ ./a.out EXAMPLE.com
EXAMPLE.com

Punycoder.com confirms that xn--examle-s0e.com is the punycode for examπle.com (with the Greek and the ASCII both lowercased). But when I give my program the pure-ASCII EXAMPLE.com, libicu fails to lowercase any of it!

How can I convince libicu to lowercase pure-ASCII domain names too?

Here's the complete C++11 source code I'm using:

#include <cstdio>
#include <string>
#include <unicode/uidna.h>
#include <unicode/ustring.h>
#include <vector>

std::string convert_utf8_to_idna(const std::string& input) {
    UErrorCode err = U_ZERO_ERROR;
    std::int32_t needed = 0;

    auto src = std::vector<UChar>(1000);
    (void)u_strFromUTF8WithSub(
        src.data(), src.size(), &needed,
        input.data(), input.size(),
        0xFFFD, nullptr, &err
    );
    src.resize(needed); // chop off the unused excess
    assert(err == U_ZERO_ERROR);

    auto dest = std::vector<UChar>(1000);
    needed = uidna_IDNToASCII(
        src.data(), src.size(),
        dest.data(), dest.size(),
        UIDNA_ALLOW_UNASSIGNED, nullptr, &err
    );
    assert(err == U_ZERO_ERROR);
    dest.resize(needed); // chop off the unused excess

    return std::string(dest.begin(), dest.end());
}

int main(int argc, char **argv) {
    std::string input = (argc >= 2) ? argv[1] : "example.com";
    std::string output = convert_utf8_to_idna(input);
    printf("%s\n", output.c_str());
}

Upvotes: 0

Views: 75

Answers (0)

Related Questions