Reputation: 2135
Detect if there is any non-ASCII character in a file path
I have a Unicode string with UTF-8 encoding that stores the file path, like, for instance, C:\Users\myUser\Downloads\ü.pdf. I have already checked that the string holds a correct file path in the local file system, but since I'm sending this string to a different process that supports only ASCII I need to figure out if that string contains any non-ASCII character.
How can I do that?
Upvotes: 8
Views: 15835
Reputation: 2135
As suggested by several comments and highlighted by @CrisLuengo answer, we can iterate the characters looking for any in the upper bit set (live example):
#include <iostream>
#include <string>
#include <algorithm>
bool isASCII (const std::string& s)
{
return !std::any_of(s.begin(), s.end(), [](char c) {
return static_cast<unsigned char>(c) > 127;
});
}
int main()
{
std::string s1 { "C:\\Users\\myUser\\Downloads\\Hello my friend.pdf" };
std::string s2 { "C:\\Users\\myUser\\Downloads\\ü.pdf" };
std::cout << std::boolalpha << isASCII(s1) << "\n";
std::cout << std::boolalpha << isASCII(s2) << "\n";
}
true
false
Upvotes: 9
Reputation: 60444
An ASCII character uses only the lower 7 bits of a char
(values 0-127). A non-ASCII Unicode character encoded in UTF-8 uses char
elements that all have the upper bit set. So, you can simply iterate the char
elements seeing if any of them has a value above 127, eg:
bool containsOnlyASCII(const std::string& filePath) {
for (auto c: filePath) {
if (static_cast<unsigned char>(c) > 127) {
return false;
}
}
return true;
}
A note on the cast: std::string
contains char
elements. The standard doesn't define whether char
is signed
or unsigned
. If it's signed
, then we can cast it to unsigned
in a well-defined way. The standard specifies exactly how this is done.
Upvotes: 9