Atlas80b
Atlas80b

Reputation: 119

Reading strings with spaces from a file

I'm working on a project and I just encountered a really annoying problem. I have a file which stores all the messages that my account received. A message is a data structure defined this way:

typedef struct _message{
char dest[16]; 
char text[512];
}message;

dest is a string that cannot contain spaces, unlike the other fields. Strings are acquired using the fgets() function, so dest and text can have "dynamic" length (from 1 character up to length-1 legit characters). Note that I manually remove the newline character after every string is retrieved from stdin.

The "inbox" file uses the following syntax to store messages:

dest
text

So, for example, if I have a message from Marco which says "Hello, how are you?" and another message from Tarma which says "Are you going to the gym today?", my inbox-file would look like this:

Marco
Hello, how are you?

Tarma
Are you going to the gym today?

I would like to read the username from the file and store it in string s1 and then do the same thing for the message and store it in string s2 (and then repeat the operation until EOF), but since text field admits spaces I can't really use fscanf().

I tried using fgets(), but as I said before the size of every string is dynamic. For example if I use fgets(my_file, 16, username) it would end up reading unwanted characters. I just need to read the first string until \n is reached and then read the second string until the next \n is reached, this time including spaces.

Any idea on how can I solve this problem?

Upvotes: 3

Views: 10627

Answers (3)

Jonathan Leffler
Jonathan Leffler

Reputation: 755010

You can use fgets() easily enough as long as you're careful. This code seems to work:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum { MAX_MESSAGES = 20 };

typedef struct Message
{
    char dest[16]; 
    char text[512];
} Message;

static int read_message(FILE *fp, Message *msg)
{
    char line[sizeof(msg->text) + 1];
    msg->dest[0] = '\0';
    msg->text[0] = '\0';
    while (fgets(line, sizeof(line), fp) != 0)
    {
        //printf("Data: %zu <<%s>>\n", strlen(line), line);
        if (line[0] == '\n')
            continue;
        size_t len = strlen(line);
        line[--len] = '\0';
        if (msg->dest[0] == '\0')
        {
            if (len < sizeof(msg->dest))
            {
                memmove(msg->dest, line, len + 1);
                //printf("Name: <<%s>>\n", msg->dest);
            }
            else
            {
                fprintf(stderr, "Error: name (%s) too long (%zu vs %zu)\n",
                        line, len, sizeof(msg->dest)-1);
                exit(EXIT_FAILURE);
            }
        }
        else
        {
            if (len < sizeof(msg->text))
            {
                memmove(msg->text, line, len + 1);
                //printf("Text: <<%s>>\n", msg->dest);
                return 0;
            }
            else
            {
                fprintf(stderr, "Error: text for %s too long (%zu vs %zu)\n",
                        msg->dest, len, sizeof(msg->dest)-1);
                exit(EXIT_FAILURE);
            }
        }
    }
    return EOF;
}

int main(void)
{
    Message mbox[MAX_MESSAGES];
    int n_msgs;

    for (n_msgs = 0; n_msgs < MAX_MESSAGES; n_msgs++)
    {
        if (read_message(stdin, &mbox[n_msgs]) == EOF)
            break;
    }

    printf("Inbox (%d messages):\n\n", n_msgs);
    for (int i = 0; i < n_msgs; i++)
        printf("%d: %s\n   %s\n\n", i + 1, mbox[i].dest, mbox[i].text);

    return 0;
}

The reading code will handle (multiple) empty lines before the first name, between a name and the text, and after the last name. It is slightly unusual in they way it decides whether to store the line just read in the dest or text parts of the message. It uses memmove() because it knows exactly how much data to move, and the data is null terminated. You could replace it with strcpy() if you prefer, but it should be slower (the probably not measurably slower) because strcpy() has to test each byte as it copies, but memmove() does not. I use memmove() because it is always correct; memcpy() could be used here but it only works when you guarantee no overlap. Better safe than sorry; there are plenty of software bugs without risking extras. You can decide whether the error exit is appropriate — it is fine for test code, but not necessarily a good idea in production code. You can decide how to handle '0 messages' vs '1 message' vs '2 messages' etc.

You can easily revise the code to use dynamic memory allocation for the array of messages. It would be easy to read the message into a simple Message variable in main(), and arrange to copy into the dynamic array when you get a complete message. The alternative is to 'risk' over-allocating the array, though that is unlikely to be a major problem (you would not grow the array one entry at a time anyway to avoid quadratic behaviour when the memory has to be moved during each allocation).

If there were multiple fields to be processed for each message (say, date received and date read too), then you'd need to reorganize the code some more, probably with another function.

Note that the code avoids the reserved namespace. A name such as _message is reserved for 'the implementation'. Code such as this is not part of the implementation (of the C compiler and its support system), so you should not create names that start with an underscore. (That over-simplifies the constraint, but only slightly, and is a lot easier to understand than the more nuanced version.)

The code is careful not to write any magic number more than once.

Sample output:

Inbox (2 messages):

1: Marco
   How are you?

2: Tarma
   Are you going to the gym today?

Upvotes: 1

yulian
yulian

Reputation: 1627

As the length of each string is dynamic then, if I were you, I would read the file first for finding each string's size and then create a dynamic array of strings' length values.

Suppose your file is:

A long time ago
in a galaxy far,
far away....

So the first line length is 15, the second line length is 16 and the third line length is 12.

Then create a dynamic array for storing these values.

Then, while reading strings, pass as the 2nd argument to fgets the corresponding element of the array. Like fgets (string , arrStringLength[i++] , f);.

But in this way you'll have to read your file twice, of course.

Upvotes: 2

BLUEPIXY
BLUEPIXY

Reputation: 40155

#include <stdio.h>

int main(void){
    char username[16];
    char text[512];
    int ch, i;
    FILE *my_file = fopen("inbox.txt", "r");

    while(1==fscanf(my_file, "%15s%*c", username)){
        i=0;
        while (i < sizeof(text)-1 && EOF!=(ch=fgetc(my_file))){
            if(ch == '\n' && i && text[i-1] == '\n')
                break;
            text[i++] = ch;
        }
        text[i] = 0;
        printf("user:%s\n", username);
        printf("text:\n%s\n", text);
    }
    fclose(my_file);
    return 0;
}

Upvotes: 2

Related Questions