Reputation: 121
I have a library which needs to parse double numbers which always use a point '.' as decimal separator. Unfortunately for this case, strtod() respects the locale which might use a different separator and thus parsing can fail. I can't setlocale() - it isn't thread-safe. So I'm searching for a clean locale-independent strtod implementation now. I have found multiple implementations so far, but all of them look hacky or just like bad code. Can someone recommend a well-tested, working, clean (ANSI) C implementation for me?
Upvotes: 11
Views: 6843
Reputation: 3720
There is the fast_float library, which, according to the README, is used by GCC, Chromium, WebKit, and LLVM. Even though it is a C++ library, you should easily be able to wrap it in a C-compatible function that works like strtod
, provided you have access to a C++ compiler.
If you have access to C++17, you could also write a C wrapper for std::from_chars
.
#include <charconv>
#include <cctype>
#include <string>
#include <cerrno>
extern "C" double strtod_locale_independent(const char *str, char **str_end) {
while (std::isspace(*str)) {
++str;
}
if (*str == '+') {
++str;
}
double d = 0.0;
auto result = std::from_chars(str, str + std::char_traits<char>::length(str), d);
if (str_end) {
*str_end = const_cast<char *>(result.ptr);
}
errno = static_cast<int>(result.ec);
return d;
}
Upvotes: 0
Reputation: 144780
Since you cannot change the locale because the application is multi-threaded, and because re-implementing strtod
is a daunting task if you wish to get the precise behavior, here is a simple alternative that should work for most cases:
#include <locale.h>
#include <stdlib.h>
#include <string.h>
double my_strtod(const char *s, char **endp) {
char buf[1024];
char *p = strchr(s, '.');
if (p == NULL || (size_t)(p - s) >= sizeof(buf)) {
return strtod(s, endp);
}
struct lconv *lp = localeconv();
*buf = '\0';
strncat(buf, s, sizeof(buf) - 1);
buf[p - s] = *lp->decimal_point;
double v = strtod(buf, &p);
if (endp) {
*endp = s + (p - buf);
}
return v;
}
If my_strtod()
is used on very long strings, it may be more efficient to analyse the initial portion of the argument string to determine how many characters to copy from it.
If you use my_strtod()
on modifiable arrays used only in the current thread, you can just temporarily replace the first period .
with a decimal-point character from the current locale:
#include <locale.h>
#include <stdlib.h>
#include <string.h>
double my_strtod(char *s, char **endp) {
char *p = strchr(s, '.');
if (p != NULL) {
struct lconv *lp = localeconv();
*p = *lp->decimal_point;
}
double v = strtod(buf, endp);
if (p != NULL) {
*p = '.';
}
return v;
}
Of course, this approach uses strtod()
so it assumes that other concurrent threads do not mess with the current locale.
Upvotes: 1
Reputation: 11
While this is very late, it may provide some insights for future readers. Following the pattern of Florian Kusche's reply above, here is a exclusively "C" alternative that I have successfully tested on various flavors of Linux. One of the requirements of this solution is to temporarily override the user's system locale prior to executing strtod()
.
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define _LOCALE_N_ 13
int category[_LOCALE_N_] =
{
LC_ALL , // All of the locale
LC_ADDRESS , // Formatting of addresses and geography-related items (*)
LC_COLLATE , // String collation
LC_CTYPE , // Character classification
LC_IDENTIFICATION, // Metadata describing the locale (*)
LC_MEASUREMENT , // Settings related to measurements (metric versus US customary) (*)
LC_MESSAGES , // Localizable natural-language messages
LC_MONETARY , // Formatting of monetary values
LC_NAME , // Formatting of salutations for persons (*)
LC_NUMERIC , // Formatting of nonmonetary numeric values
LC_PAPER , // Settings related to the standard paper size (*)
LC_TELEPHONE , // Formats to be used with telephone services (*)
LC_TIME // Formatting of date and time values
};
void _store_locale_info(char *vals[_LOCALE_N_])
{
/* store the current locale information in an array of strings for future use */
int i;
for (i=0; i<_LOCALE_N_; i++)
{
char *loc_str = setlocale(category[i], "");
int L = strlen(loc_str);
vals[i] = calloc(L+1, sizeof(char));
strncpy(vals[i], setlocale(category[i], ""), L+1);
}
}
void _restore_locale_info(char *vals[_LOCALE_N_])
{
/* restore the locale information from a previosly-populated array of strings */
int i;
for (i=0; i<_LOCALE_N_; i++)
{
if (vals[i])
{
setlocale(category[i], vals[i]);
free(vals[i]);
}
}
}
double _strtod_c (const char *string, char **endPtr)
{
/* Wrapper function for strtod() that enforces the "C" locale before converting a floating-point
* number from an ASCII decimal representation to internal double-precision format. */
char *vals[_LOCALE_N_];
double rval = 0;
_store_locale_info(vals);
rval = strtod(string, endPtr);
_restore_locale_info(vals);
return rval;
}
int main()
{
char *str = "1024.123456";
char **endPtr;
char locale_str[100];
printf("\nstr = \"%s\"\n\n", str);
printf("Locale\n", str);
strcpy(locale_str, setlocale(LC_ALL, "C"));
printf("%-6s :: strtod(str, endPtr) = %.15g\n", locale_str, strtod(str, endPtr));
strcpy(locale_str, setlocale(LC_ALL, "de_DE"));
printf("%-6s :: strtod(str, endPtr) = %.15g\n", locale_str, strtod(str, endPtr));
printf("---\n");
strcpy(locale_str, setlocale(LC_ALL, "C"));
printf("%-6s :: _strtod_c(str, endPtr) = %.15g\n", locale_str, _strtod_c(str, endPtr));
strcpy(locale_str, setlocale(LC_ALL, "de_DE"));
printf("%-6s :: _strtod_c(str, endPtr) = %.15g\n", locale_str, _strtod_c(str, endPtr));
printf("\n");
}
The expected output on a compatible Linux installation is
str = "1024.123456"
Locale
C :: strtod(str, endPtr) = 1024.123456
de_DE :: strtod(str, endPtr) = 1024
---
C :: _strtod_c(str, endPtr) = 1024.123456
de_DE :: _strtod_c(str, endPtr) = 1024.123456
Upvotes: 0
Reputation: 161
Warning: The proposed implementation from ruby contains bugs. I wouldn't mind the small difference pointed out by gavin, but if you try to parse something like "0.000000000000000000000000000000000000783475" you will get 0.0 instead of 7.834750e-37 (like the stock strtod() returns.)
Other solution:
#include <sstream>
#include "strtod_locale_independent.h"
extern "C" double strtod_locale_independent(const char* s)
{
std::istringstream text( s );
text.imbue(std::locale::classic());
double result;
text >> result;
return result;
}
I don't know how fast this is, though.
Upvotes: 3
Reputation: 71
Following the answer above, I tried using the Ruby implementation at ruby_1_8/missing/strtod.c. However, for some inputs this gives different answers to gcc's built-in parser and to strtod from stdlib.h, both on Mac and on Linux platforms:
char * endptr ;
double value1 = 1.15507e-173 ;
double value2 = strtod( "1.15507e-173", &endptr ) ;
double value3 = test_strtod( "1.15507e-173", &endptr ) ;
assert( sizeof( double ) == sizeof( unsigned long ) ) ;
printf( "value1 = %lg, 0x%lx.\n", value1, *(unsigned long*)( &value1 ) ) ;
printf( "value2 = %lg, 0x%lx.\n", value2, *(unsigned long*)( &value2 ) ) ;
printf( "value3 = %lg, 0x%lx.\n", value2, *(unsigned long*)( &value3 ) ) ;
assert( value1 == value2 ) ;
assert( value1 == value3 ) ;
which prints
value1 = 1.15507e-173, 0x1c06dace8bda0ee0.
value2 = 1.15507e-173, 0x1c06dace8bda0ee0.
value3 = 1.15507e-173, 0x1c06dace8bda0edf.
Assertion failed: (value1 == value3), function main, file main.c, line 16.
So my advice is to test the chosen implementation before use.
Upvotes: 2
Reputation: 3006
There's also gdtoa available on netlib, BSD style license: http://www.netlib.org/fp/gdtoa.tgz
Upvotes: 1
Reputation: 3869
Grab some known implementation (that doesn't depend on atof
), such as the one distributed with ruby:
ruby_1_8/missing/strtod.c.
Upvotes: 3