user877329
user877329

Reputation: 6200

Locale-invariant string processing with strtod strtof atof printf?

Are there any plans for adding versions of C standard library string processing functions that are invariant under current locale?

Currently there are lots of fragile workarounds, for example, from jansson/strconv.c:

static void to_locale(strbuffer_t *strbuffer)
{
    const char *point;
    char *pos;

    point = localeconv()->decimal_point;
    if(*point == '.') {
        /* No conversion needed */
        return;
    }

    pos = strchr(strbuffer->value, '.');
    if(pos)
        *pos = *point;
}

static void from_locale(char *buffer)
{
    const char *point;
    char *pos;

    point = localeconv()->decimal_point;
    if(*point == '.') {
        /* No conversion needed */
        return;
    }

    pos = strchr(buffer, *point);
    if(pos)
        *pos = '.';
}

These functions preprocess its input so it can be used independent of the current locale, under the assumption

  1. That the delimiter is one byte
  2. No call to setlocale happens between these fix function and the call to any of the affected functions
  3. The string can be modified before conversion

(1) implies that the preprocessing approach breaks on exotic locales (see https://en.wikipedia.org/wiki/Decimal_mark#Hindu.E2.80.93Arabic_numeral_system for examples). (2) implies that the preprocessing approach cannot be threadsafe without a lock, and that lock must be added to the C library. (3) Just stupid.

If it were only possible to specify the locale for a single call to a string-processing function as a parameter, not affecting any other threads, none of these restrictions would apply.

Questions:

  1. Are there any reports to WG14, or WG21 that address this defect?
  2. If so, why hasn't these been merged into the standard? It would be nothing more than a new set of functions that take a locale as argument.
  3. What is the canonical workaround?

Update:

After searching through the Internet, I found the *_l functions, available on FreeBSD, GNU/Linux and MacOSX. Similar functions exists on Windows also. These solve my problem, however these are not in POSIX, which is a superset of C (not really, POSIX relaxes on pointers). So questions 1, and 2 remains open.

Upvotes: 7

Views: 2471

Answers (3)

Glibc does not have locale-specific functions, but it has the POSIX-standard uselocale function that allows setting the locale per thread. So instead of providing many locale-specific functions it is possible to use any standard function - including one wrapped in a library call - by changing the locale temporarily:

locale_t original = uselocale(loc);
// use printf/scanf/etc which now use `loc`
uselocale(original);

Upvotes: 0

over_optimistic
over_optimistic

Reputation: 1419

sqlite has locale independant printf implementation which is good for your sort of thing as it makes doubles compatible with sql syntax rules. If you can include sqlite as a dependency then that might be a viable option.

Upvotes: 2

Jonathan Leffler
Jonathan Leffler

Reputation: 754090

BSD and macOS Sierra (and Mac OS X before it) support _l functions that allow you to specify the locale, rather than relying on the current locale. For example:

int
fprintf_l(FILE * restrict stream, locale_t loc, const char * restrict format, ...);

int
printf_l(locale_t loc, const char * restrict format, ...);

int
snprintf_l(char * restrict str, size_t size, locale_t loc, const char * restrict format, ...);

int
sprintf_l(char * restrict str, locale_t loc, const char * restrict format, ...);

and:

int
fscanf_l(FILE * restrict stream, locale_t loc, const char * restrict format, ...);

int
scanf_l(locale_t loc, const char * restrict format, ...);

int
sscanf_l(const char * restrict str, locale_t loc, const char * restrict format, ...);

As a general design, this seems sensible. The type locale_t is not part of Standard C but is part of POSIX (and defined in <locale.h> there), and used in <ctype.h> amongst other places. The BSD man pages say that the header to use is <xlocale.h> rather than <locale.h>; this would perhaps be fixed by the standard. Unless there is a major flaw in the design of the BSD functions, these should be a very good basis for any standardization effort, whether that was under POSIX or Standard C.

One issue with the BSD design might be that the locale_t structure is passed by value, not by (constant restricted) pointer, which is a little surprising. However, it is consistent with the POSIX functions such as:

int   isalpha_l(int, locale_t);

A similar scheme might be devised for handling time zone settings, too. There'd be more work in setting that up since there isn't already a time zone type (whereas the locale_t is part of POSIX already — and could probably be adopted without change into standard C). But, combined with locale settings, it could make the time routines more easily usable in diverse environments from a single executable.

Upvotes: 4

Related Questions