Stefano Raneri
Stefano Raneri

Reputation: 343

How to read a string with whitespaces and a int in the same line in c

How can i read a text file like this one:

Acqua Naturale 200
Coca Cola 100
Bibite 300

and store in a string Acqua naturale and Coca Cola and their int value in a int variable, using sscanf().

The example code is this:

struct Test
{
 char name[16];
 int id;
};


 char * buffer = malloc(sizeof(struct Test));

 while(fgets(buffer, sizeof(struct Test), filep))
    {
      if(sscanf(buffer, "%s %d", p.name, &p.id) == 2)
      {
        //do something with data

      }
    }

Upvotes: 0

Views: 429

Answers (5)

Stefano Raneri
Stefano Raneri

Reputation: 343

I found this solution using strtok(), strcpy(), strcat(), atoi() and isdigit(). I am using linked list to store the data, so i think it is a specific solution. Ignore the parameter of the function Load() and the function CreateNewNodeOfList().

void Load(HeadNode *pp) // ignore parameter
{
  FILE *f;
  struct Test p;
  char * buffer;
  char * token;
  char name[32] = "";
  if(!(f = fopen(PATH, "r")))
  {
    perror("Errore");
    exit(-1);
  }

  buffer = malloc(sizeof(struct Test));

  while(fgets(buffer, sizeof(struct Test), f))
    {
    for(token = strtok(buffer, " "); token != NULL; token = strtok(NULL, " "))
      {
          if(isdigit(token[0]))
            {
              p.id = atoi(token);
            }
            else
            {
              strcat(p.name, token);
              strcat(p.name, " ");
            }
      }
        CreateNewNodeOfList(p, pp); //ignore this function 
        strcpy(p.name, "");
    }

free(buffer);
fclose(f);

}

Upvotes: 0

ryyker
ryyker

Reputation: 23208

Two quick observations,

  • strtok() over sscanf() is a better choice given this particular task.

  • Unless there is only one record (data line) in the input file, an array of struct (as opposed to a single instance) is needed to contain the data.

Rational:
The more defined and predictable the syntax of a source file, the less complex it is to parse. Your file, as described has predictable contents. With limited variability in syntax, tokenizing the record, using the strtok() function is a good choice.

For what you are doing, the only variability in your file content would be the number of lines, and the number of alpha strings preceding the numeric string at the end. The rest assumes space separated sub-strings within each line, with only the last having numeric content. So one approach that would accommodate this type of file might use run-time memory creation for an array of struct, based on number of lines to process, and the strtok() function to read through the elements, storing each based on the type of string it is (either alpha or numeric).

Example approach:

file: x.txt contains the following:

Acqua Naturale 200
Coca Cola 100
Bibite 300
Nesbits Gold 400
Fanta Iced Orange 500
Coca Cola Cherry Cream 600

char filename[] = {".\\x.txt"};

typedef struct {
    char name[200]; // add plenty of space
    int id;
}TEST;

void PopulateTest(TEST *t, char *file);//populate struct with content of file.
int GetLines(char *name);//get line count

int main(int argc, char *argv[])
{
    int lineCount = GetLines(filename);//get lines in file
    int i;

    TEST *test;//to create a variable number of instances of TEST

    test = calloc(lineCount, sizeof(TEST));
    if(test)
    {
        PopulateTest(test, filename);
    }
    for(i=0;i<lineCount;i++)
    {
        ;//do something with results    
    }
    free(test);

    return 0;
}

void PopulateTest(TEST *t, char *file)
{
    int num = 0;
    int i = 0;
    char *tok = NULL;
    char line[200] = {0};
    char accum[200] = {0};
    FILE *fp = fopen(filename, "r");
    if(fp)
    {
        while(fgets(line, sizeof(line), fp))
        {
            tok = strtok(line, " ");
            while(tok)// this loop accommodates a variable number of fields within each line 
            {
                if(isdigit(tok[0]))//test for sub-string content
                {
                    num = atoi(tok);
                }
                else               //read string segments and reconstruct string,
                {
                    strcat(accum, tok);
                    strcat(accum, " ");
                }
                tok = strtok(NULL, " ");
            }
            strcpy(t[i].name, accum);//populate struct element members with parsed data.
            t[i].id = num;
            i++;
        }
        fclose(fp);
    }
    return; 
}

int GetLines(char *name)
{
    int count = 0;
    char line[200] = {0};
    FILE *fp = fopen(name, "r");
    if(fp)
    {
        while(fgets(line, sizeof(line), fp))
        {
            count++;
        }
        fclose(fp);
    }
    return count;
}

Upvotes: 1

Steve Summit
Steve Summit

Reputation: 47923

Before trying to write code to read this file, you should think a little more about how the file is defined -- precisely how it's defined.

Informally, the definition of the file is "the first column is a string possibly containing whitespace, and the second column is an integer". But what separates the columns?

If the columns are separated by whitespace, and if the first column can contain whitespace, then the first column isn't really the first column, it's potentially multiple columns. That is, the line

Coca Cola 100

really contains three columns.

So if we want to go down this road, we have to try to differentiate between a second column that's an integer, and a first column that (though it might contain whitespace) does not look like an integer.

But if we go down that road, we have two pretty significant problems:

  1. It's hard to code. It's probably impossible to code satisfactorily using scanf or sscanf alone.

  2. It's still ambiguous. What if Coca Cola comes out with a new product "Coca Cola 2020"? Then we'll have a line like

    Coca Cola 2020 50

So my bottom line is, if it was me, I wouldn't even try to write code to parse this file format. I would come up with a cleaner, less ambiguous file format, perhaps

Coca Cola, 100

or

"Coca Cola",100

or

Coca Cola|100

and then write some clean and simple code to parse that. (I probably still wouldn't use scanf, though; I'd probably use something more like strtok. See also this chapter in my C Programming notes.)


Addendum: the other road to potentially go down is to count columns from the right-hand edge. In this case, you could write code to, in effect, say that the product name is in columns 1 to N-1, and the count is column N. This can work as long as there's at most one "column" containing whitespace.

Upvotes: 1

chux
chux

Reputation: 153338

To separate "Acqua Naturale 200" into "Acqua Naturale" and 200 is a problem of looking for an integer at the end of the line.

Various approaches.

Perhaps look for last space separator,

OP nicely reads a line and then attempts to parse - this is better than scanf().

Note that OP's buffer size is too small. Consider "abcdefghijklmno -2000000000\n", valid input which needs size 15 + 1 + 11 + 1 + 1 bytes. Certainly that is more than sizeof(struct Test) as the text of a int may need more space than the binary encoded int (e.g. 2, 4 or 8 bytes).

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

... 

  FILE *filep;
  struct Test p;
  //                        p.name    sp  int  \n  \0
  #define LINE_SIZE (sizeof p.name  + 1 + 11 + 1 + 1)
  char buffer[LINE_SIZE *2]; // No need to be stingy with temp buffer space, go for x2

  while(fgets(buffer, sizeof buffer, filep)) {
    char *last_space = strrchr(buffer, ' ');
    if (last_space == NULL || (last_space - buffer) >= sizeof p.name ||
        sscanf(last_space, "%d", &p.id) == 0) {
      fprintf(stderr, "Bad input '%s'\n", buffer);
      break;
    }
    memcpy(p.name, buffer, last_space - buffer);
    p.name[last_space - buffer] = '\0';

    // Do something with `p`
  }

More robust code would use a strtoi and look for extra junk after the number as in "xxx 122zzz". Excessively long lines should be detected too.

Upvotes: 1

Lundin
Lundin

Reputation: 213378

There's some misconceptions here.

  • The input file appears to be a text file. If so, you cannot read the sizeof a struct from it, since that would assume binary format, not text which is longer. Instead of malloc, just allocate buffer as "large enough", for example char buffer[200];.
  • Even in case of a binary file, you should never read/write a struct directly to/from a file. This is because structs can contain padding bytes to fix alignment, and those can be located anywhere, in a non-portable manner. So if one program writes the file and another reads it, it will break. The common way to read/write structs is to "serialize" and "de-serialize" by reading/writing each individual member at a time.
  • So instead read the whole line as text into buffer, then parse through it. sscanf is a rather blunt tool for this unless you are certain of the buffer format. Instead you can search for the last space ' ' in the string and take anything before it as the name (make sure it is less than 15 characters + 1 null terminator), and everything after the space you pass to strtol which converts it to int.

Upvotes: 0

Related Questions