Psyfun
Psyfun

Reputation: 369

Scansets in sscanf with string delimiters

I have an ASCII string in the form of:

00001\x02This is a string\x030000100001\x021.0\x03\x021.0\x03\x021.0\x03\x021.0\x03001

In descriptive terms, it is a string with a 5 digit leading zeroes integer, a string encapsulated by the STX and ETX ASCII characters, 2 more 5 digit leading zeroes integers, 4 floating point values encapsulated by STX and ETX ASCII characters followed by a 3 digit leading zeroes integer. I am trying to use sscanf to parse the string. It is not behaving as it should. I am using the following format string:

"%5hu\x02%[0-9a-zA-Z _-]s\x03%5hu%05hu\x02%lf\x03\x02%lf\x03\x02%lf\x03\x02%lf\x03%3hhu"

I have tried changing the scanset to different values including [^\x03]. I have also tried adding and removing length specifiers. I am pretty sure it is the scanset that is causing the problem. I am wondering if it is having a hard time with the STX and ETX character literals. Anyone have an idea why this isn't working? Or a better alternative using pure C89? Thanks.

For those wanting the full code to test:

unsigned short one;
char two[32];
unsigned short three, four;
double five, six, seven, eight;
unsigned char nine;
char temp[] = "00001\x02This is a string\x03""0000100002\x02""1.1\x03\x02""2.2\x03\x02""3.3\x03\x02""4.5\x03""003";
sscanf(temp, "%5hu\x02%[-0-9a-zA-Z _]s\x03%5hu%05hu\x02%lf\x03\x02%lf\x03\x02%lf\x03\x02%lf\x03%3hhu",
       &one, two, &three, &four, &five, &six, &seven, &eight, &nine);

Upvotes: 1

Views: 1292

Answers (4)

Psyfun
Psyfun

Reputation: 369

So, I have determined that scanset is broken in this version of the C runtime libraries. Here is code that works and doesn't work:

unsigned short one;
char two[32];
unsigned short three, four;
double five, six, seven, eight;
unsigned char nine;
int rc1, rc2;

char temp[] = "00001\x02string\x03""0000100002\x02""1.1\x03\x02""2.2\x03\x02""3.3\x03\x02""4.5\x03""003";

char format1[] = "%5hu\x02%[a-z]\x03%5hu%5hu\x02%lf\x03\x02%lf\x03\x02%lf\x03\x02%lf\x03%3hhu";
char format2[] = "%5hu\x02%s\x03%5hu%5hu\x02%lf\x03\x02%lf\x03\x02%lf\x03\x02%lf\x03%3hhu";

rc1 = sscanf(temp, format1, &one, two, &three, &four, &five, &six, &seven, &eight, &nine);   
rc2 = sscanf(temp, format2, &one, two, &three, &four, &five, &six, &seven, &eight, &nine);

In the above code, rc1 returns 1 item successfully scanned and rc2 returns showing 9 items successfully scanned. So, the conclusion that I have made is that scansets do not work properly with this hardware/software combination. Anyone have any other thoughts or conclusions? Thanks for all the help. I didn't award anyone a solution, but did give points for helpful answers.

Upvotes: 1

chux
chux

Reputation: 153508

The format specifier is too easily messed. Consider breaking it up (easier to understand and maintain) and check results

#define Int5 "%5hu"
// Note:  no 0 ^

#define STX  "\x02"
#define ETX  "\x03"
// Could use hexadecimal constants here as the string is broken up.

#define EncStr STX "%31[0-9a-zA-Z _-]" ETX
// Note:                        no s ^   (@Jonathan comment s is not part of %[]
// String limit      ^

#define FP    STX "%lf" ETX
#define Int3  "%3hhu"

if (9 == sscanf(temp, Int5 EncStr Int5 Int5 FP FP FP FP Int3, 
    &one, two, &three, &four, &five, &six, &seven, &eight, &nine)) Success();

Note: temp need to be broken up for hexadecimal clarity or use octal constants

char temp[]  = "00001\x02This is a string\x03" "0000100001\x02" "1.0\x03\x02" 
     "1.0\x03\x02" "1.0\x03\x02" "1.0\x03" "001";
char temp[]  = "00001\002This is a string\0030000100001\0021.0\003\0021.0\003\0021.0\003\0021.0\003001";

Upvotes: 1

Jonathan Leffler
Jonathan Leffler

Reputation: 753990

Watch out: octal constants are limited to at most 3 digits after the backslash, but hex constants are not limited to two or three hexadecimal digits, so \x0300000100001 is all a single character.

GCC warned me:

ssss.c:6:1: error: hex escape sequence out of range [-Werror]
 "00001\x02This is a string\x030000100001\x021.0\x03\x021.0\x03\x021.0\x03\x021.0\x03001";
 ^
ssss.c:6:1: error: hex escape sequence out of range [-Werror]

(In the light of the edit to the question, you're already aware of this issue.)

Also, note that a scanset stands on its own; it is not a qualifier to the s conversion specifier. Your format string looks for an actual s in the data after what is matched by the scanset, and will never find one since the scanset eats any s's before the literal match. This is your actual problem; remove the s after the %[0-9a-zA-Z _-] scanset.

This code works. Note the judicious breaking up of the data string so that the hex constants are terminated where you want them terminated. The split in the format string simplifies presentation. C joins together two adjacent string literals, which is extremely useful.

#include <stdio.h>

int main(void)
{
    char const data[] =
        "00001\x02This is a string\x03" "0000100001\x02" "1.0\x03\x02"
        "1.0\x03\x02" "1.0\x03\x02" "1.0\x03" "001";
    char const format[] =
        "%5hu\x02%[0-9a-zA-Z _-]\x03%5hu%05hu\x02%lf\x03\x02%lf\x03\x02"
        "%lf\x03\x02%lf\x03%3hhu";

    unsigned short i1;
    char s2[20];
    unsigned short i3;
    unsigned short i4;
    double d5;
    double d6;
    double d7;
    double d8;
    unsigned char i9;
    int rc;

    if ((rc = sscanf(data, format, &i1, s2, &i3, &i4, &d5, &d6, &d7, &d8, &i9)) != 9)
        printf("sscanf failed - %d conversions\n", rc);
    else
        printf("i1 = %d; s2 = [%s]; i3 = %d; i4 = %d; d5 = %f;\n"
               "d6 = %f; d7 = %f; d8 = %f; i9 = %d\n",
               i1, s2, i3, i4, d5, d6, d7, d8, i9);
    return 0;
}

Sample output:

i1 = 1; s2 = [This is a string]; i3 = 1; i4 = 1; d5 = 1.000000;
d6 = 1.000000; d7 = 1.000000; d8 = 1.000000; i9 = 1

I added this code before the return 0 in main(). This has no s after the scanset; when I left the s in place, sscanf() returned the value 2, not 9.

    unsigned short one;
    char two[32];
    unsigned short three, four;
    double five, six, seven, eight;
    unsigned char nine;
    char temp[] = "00001\x02This is a string\x03""0000100002\x02""1.1\x03\x02""2.2\x03\x02""3.3\x03\x02""4.5\x03""003";
    if ((rc = sscanf(temp, "%5hu\x02%[-0-9a-zA-Z _]\x03%5hu%05hu\x02%lf\x03\x02%lf\x03\x02%lf\x03\x02%lf\x03%3hhu",
           &one, two, &three, &four, &five, &six, &seven, &eight, &nine)) != 9)
        printf("sscanf failed - %d conversions\n", rc);
    else
        printf("i1 = %d; s2 = [%s]; i3 = %d; i4 = %d; d5 = %f;\n"
               "d6 = %f; d7 = %f; d8 = %f; i9 = %d\n",
               one, two, three, four, five, six, seven, eight, nine);

The combined program output was:

i1 = 1; s2 = [This is a string]; i3 = 1; i4 = 1; d5 = 1.000000;
d6 = 1.000000; d7 = 1.000000; d8 = 1.000000; i9 = 1
i1 = 1; s2 = [This is a string]; i3 = 1; i4 = 2; d5 = 1.100000;
d6 = 2.200000; d7 = 3.300000; d8 = 4.500000; i9 = 3

Testing on Mac OS X 10.9.1 Mavericks with GCC 4.8.2.

Upvotes: 1

abligh
abligh

Reputation: 25129

I can't immediately see why it doesn't work, though I would break it down into a a single string and make it longer and longer until it fails.

Personally, I would parse this with strtok (on the boundaries encapsulated by STX and ETX) then use scanf to read in the specific floats and integers.

Upvotes: 1

Related Questions