Robert M
Robert M

Reputation: 71

boost::locale::normalize() returns empty strings with ICU backend

Reduced problem code:

#include <iostream>
#include <string>
#include <vector>
#include <boost/locale.hpp>

std::string fold_case_nfkc(std::string_view str)
{
    return boost::locale::normalize(boost::locale::fold_case(std::string(str)), boost::locale::norm_nfkc);
}

std::string normalize_nfkc(std::string_view str)
{
    return boost::locale::normalize(std::string(str), boost::locale::norm_nfkc);
}

std::string fold_case_nfc(std::string_view str)
{
    return boost::locale::normalize(boost::locale::fold_case(std::string(str)), boost::locale::norm_nfc);
}

std::string normalize_nfc(std::string_view str)
{
    return boost::locale::normalize(std::string(str), boost::locale::norm_nfc);
}

bool same_text(std::string_view left_, std::string_view right_)
{
    auto left{ fold_case_nfkc(left_) };
    auto right{ fold_case_nfkc(right_) };
    return left.compare(right) == 0;
}

int main()
{
    auto lbm = boost::locale::localization_backend_manager::global();
    auto s = lbm.get_all_backends();
    std::for_each(s.begin(), s.end(), [](std::string& x){ std::cout << x << std::endl; });
    lbm.select("icu");
    boost::locale::localization_backend_manager::global(lbm);
    boost::locale::generator g;
    std::locale::global(g(""));
    auto test = u8"#접시가숟가락으로도망쳤다";
    std::cout << "input: " << test << std::endl;
    std::cout << "fold_case_nfkc: " << fold_case_nfkc(test) << std::endl;
    std::cout << "normalize_nfc: " << normalize_nfc(test) << std::endl;
    return 0;
}

The expected output is:

backends: icu posix std
input: #접시가숟가락으로도망쳤다
fold_case_nfkc: #접시가숟가락으로도망쳤다
normalize_nfc: #접시가숟가락으로도망쳤다

The output I actually get, if icu is the locale backend:

rmorales2005@tillie:~ % clang++ -o test locale_test.cpp -std=c++17 -I/usr/local/include -L/usr/local/lib -lboost_locale
rmorales2005@tillie:~ % ./test
backends: icu posix std
input: #접시가숟가락으로도망쳤다
fold_case_nfkc: #
normalize_nfc: #

(This system is FreeBSD 12.1, with clang version 8.0.1; boost-libs installed via ports)

If I use posix, or run the program on Windows, I get the expected output. But for posix this is only because it doesn't even support normalization.

How do I get this code to work with icu as the backend?

Upvotes: 1

Views: 251

Answers (1)

sehe
sehe

Reputation: 393134

It could be your terminal emulation playing tricks. On my Linux box, I get similar output when running from inside Vim, but running it through e.g. od or xxd shows that the bytes are there, and when redirecting to a file, it shows up correctly in an editor.

Note though that on my c++20 compiler, the char8_t const* string (from u8"") cannot be streamed to std::cout, so I changed it to a regular "" literal after making sure that my source file is utf-8 encoded.

See the message: https://wandbox.org/permlink/BQnIsAzXMQVkE3Zn (compare with your compiler version of flags)

Here's a demonstruction:

Showing The Missing Terminal Output

For all backends

enter image description here

Showing The Hex Dump Filtered Output

For all backends

enter image description here

Showing The Redirected Output

For all backends

enter image description here

Upvotes: 1

Related Questions