Reputation: 1434
I am working on a internationalization project. Do other languages, such as Arabic or Chinese, use different representations for digits besides 0-9? If so, are there versions of atoi() that will account for these other representations?
I should add that I am mainly concerned with parsing input from the user. If the users types in some other representation I want to be sure that I recognize it as a number and treat it accordingly.
Upvotes: 6
Views: 1315
Reputation: 7744
I may use std::wistringstream
and locale to generate this integer.
#include <sstream>
#include <locale>
using namespace std;
int main()
{
locale mylocale("en-EN"); // Construct locale object with the user's default preferences
wistringstream wss(L"1"); // your number string
wss.imbue( mylocale ); // Imbue that locale
int target_int = 0;
wss >> target_int;
return 0;
}
More info on stream class and on locale class.
Upvotes: 6
Reputation:
If you are concerned about international characters, then you need to ensure you use an "Unicode-aware" function such as _wtoi(..).
You can also check if UNICODE is supported to make it type independent (from MSDN):
TCHAR tstr[4] = TEXT("137");
#ifdef UNICODE
size_t cCharsConverted;
CHAR strTmp[SIZE]; // SIZE equals (2*(sizeof(tstr)+1)). This ensures enough
// room for the multibyte characters if they are two
// bytes long and a terminating null character. See Security
// Alert below.
wcstombs_s(&cCharsConverted, strTmp, sizeof(strTmp), (const wchar_t *)tstr, sizeof(strTmp));
num = atoi(strTmp);
#else
int num = atoi(tstr);
#endif
In this example, the standard C library function wcstombs translates Unicode to ASCII. The example relies on the fact that the digits 0 through 9 can always be translated from Unicode to ASCII, even if some of the surrounding text cannot. The atoi function stops at any character that is not a digit.
Your application can use the National Language Support (NLS) LCMapString function to process text that includes the native digits provided for some of the scripts in Unicode.
Caution Using the wcstombs function incorrectly can compromise the security of your application. Make sure that the application buffer for the string of 8-bit characters is at least of size 2*(char_length +1), where char_length represents the length of the Unicode string. This restriction is made because, with double-byte character sets (DBCSs), each Unicode character can be mapped to two consecutive 8-bit characters. If the buffer does not hold the entire string, the result string is not null-terminated, posing a security risk. For more information about application security, see Security Considerations: International Features.
Upvotes: 2