Reputation: 31
After building boost::regex version 1.52 libraries with International Components for Unicode (ICU) support, a regular expression with a case-insensitive match doesn't appear to handle uppercase and lowercase German umlaut characters as expected.
static const std::string pattern("^.*" "\303\226" ".*$");
static const std::string test1("SCH" "\303\226" "NE");
static const std::string test2("sch" "\303\266" "ne");
static const boost::regex exp(pattern, boost::regex::icase);
const char *result = (boost::regex_match(test1, exp)) ? "Match" : "NoMatch";
std::cout << "Testing \"" << test1 << "\" against pattern \"" << pattern
<< "\" : " << result << std::endl;
result = (boost::regex_match(test2, exp)) ? "Match" : "NoMatch";
std::cout << "Testing \"" << test2 << "\" against pattern \"" << pattern
<< "\" : " << result << std::endl;
Yields:
Testing "SCHÖNE" against pattern "^.*Ö.*$" : Match
Testing "schöne" against pattern "^.*Ö.*$" : NoMatch
Upvotes: 3
Views: 2995
Reputation: 21
Working with Unicode and ICU string types.
#include <iostream>
#include <boost/regex.hpp>
#include <boost/regex/icu.hpp>
int main()
{
static const std::string pattern("^.*" "\303\226" ".*$");
static const std::string test1("SCH" "\303\226" "NE");
static const std::string test2("sch" "\303\266" "ne");
static const boost::u32regex exp=boost::make_u32regex(pattern, boost::regex::icase);
const char *result = (boost::u32regex_match(test1, exp)) ? "Match" : "NoMatch";
std::cout << "Testing \"" << test1 << "\" against pattern \"" << pattern
<< "\" : " << result << std::endl;
result = (boost::u32regex_match(test2, exp)) ? "Match" : "NoMatch";
std::cout << "Testing \"" << test2 << "\" against pattern \"" << pattern
<< "\" : " << result << std::endl;
}
Upvotes: 2