user8522137
user8522137

Reputation:

Read a text file into a 2D array in C

I'm trying to read an entire text file into a 2D array, so I can limit how much it can be stored and to know when to do a new line (if anyone has a better idea, I'm open to suggestions).

This is what I have so far:

int main(int argc, char** argv) {

    char texto[15][45];
    char ch;
    int count = 0;
    FILE *f = fopen("texto.txt", "r");

    if(f == NULL)
        printf("ERRO ao abrir o ficheiro para leitura");

    while((ch = fgetc(f) != EOF))
        count++;

    rewind(f);

    int tamanho = count;

    texto = malloc(tamanho *sizeof(char));

    fscanf(f, "%s", texto);

    fclose(f);

    printf("%s", texto);

    return (EXIT_SUCCESS);
}

And the text file is like this

lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip

But I get this error

error: assignment to expression with array type

here

texto = malloc(tamanho *sizeof(char));

Upvotes: 0

Views: 190

Answers (2)

David C. Rankin
David C. Rankin

Reputation: 84561

The problem you are tasked with is one of forcing you to understand the differences and limitations between character-oriented input, formatted-input, and line-oriented input. You are setting your array limits as:

char texto[15][45];

Above declares an array of 15-1D arrays containing 45 characters each which will be sequential in memory (the definition of an array). That means at each index texto[0] - texto[14] you can store at most 45 characters (or a string of 44 characters followed by the nul-terminating character).

You are then given a file of seven line of 45 characters each. But there are only 44 characters in each line? -- wrong. Since (presumably given "texto.txt") the information is held within a text file, there will be an additional '\n' (newline) character at the end of each line. You must account for its presence in reading the file. Each line in the file will look something like the following:

        10        20        30        40
123456789012345678901234567890123456789012345
lorem ipsum lorem ipsum lorem ipsum lorem ip\n

(where the numbers simply represent a scale showing how many characters are present in each line)

The ASCII '\n' character is a single-character.

The formatted-input Approach

Can you read the input with fscanf using the "%s" conversion specifier? (Answer: no) Why? The "%s" conversion specifier stops reading when it encounters the first whitespace characters after reading non-whitespace characters. That means reading with fscanf (fp, "%s", ...) will stop reading after the 5th character.

While you can remedy this by using the character-class conversion specifier of the form [...] where the brackets contains characters to be included (or excluded if the first character in the class is '^'), you leave the '\n' character unread in your input stream.

While you can remedy that by using the '*' assignment-suppression character to read and discard the next character (the newline) with "%*c", if you have any additional characters in the line, they too will remain in the input buffer (input stream, e.g. your file) unread.

Are you beginning to get the picture that doing file input with the scanf family of functions is inherently fragile? (you would be right)

A naive implementation using fscanf could be:

#include <stdio.h>

#define NROWS 15    /* if you need a constant, #define one (or more) */
#define NCOLS 45

int main (int argc, char **argv) {

    char texto[NROWS][NCOLS] = {""};
    size_t n = 0;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    /* read up to NROWS lines of 44 char each with at most 1 trailing char */
    while (n < NROWS && fscanf (fp, "%44[^\n]%*c", texto[n]) == 1)
        n++;    /* increment line count */

    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    for (size_t i = 0; i < n; i++)  /* output lines stored */
        printf ("texto[%2lu]: '%s'\n", i, texto[i]);

    return 0;
}

(note: if you can guarantee that your input file format is fixed and never varies, then this can be an appropriate approach. However, a single additional stray character in the file can torpedo this approach)

Example Use/Output

$ ./bin/texto2dfscanf <dat/texto.txt
texto[ 0]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 1]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 2]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 3]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 4]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 5]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 6]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'

line-oriented Input

A better approach is always a line-oriented approach. Why? It allows you to separately validate the read of a line of data from your file (or from the user) and then validate parsing the necessary information from that line.

But there is an intentional catch in the sizing of texto that complicates a simplistic line-oriented approach. While you may be tempted to simply attempting reading each line of text into texto[0-14], you would only be reading the text into texto and leaving the '\n' unread. (What? I thought line-oriented input handles this? -- It does if you provide sufficient space in the buffer you are trying to fill...)

Line-oriented input functions (fgets and POSIX getline) read and include the trailing '\n' into the buffer being filled -- provided there is sufficient space. If using fgets, fgets will read no more characters than specified into the buffer (which provides protection of your array bounds). Your task here has been designed to require reading of 46 characters with a line oriented function in order to read:

the text + '\n' + '\0'

(the text plus the newline plus the nul-terminating character)

This forces you to do line-oriented input properly. Read the information into a buffer of sufficient size to handle the largest anticipated input line (and don't skimp on buffer size). Validate your read succeeded. And then parse the information you need from the line using any manner you choose (sscanf is fine in this case). By doing it in this two-step manner, you can read the line, determine the original length of the line read (including the '\n') and validate whether it all fit in your buffer. You can then parse the 44 characters (plus room for the nul-terminating characters).

Further, if additional characters remain unread, you know that up-front and can then continually read and discard the remaining characters in preparation for your next read.

A reasonable line-oriented approach could look something like the following:

#include <stdio.h>
#include <string.h>

#define NROWS 15    /* if you need a constant, #define one (or more) */
#define NCOLS 45
#define MAXC  1024

int main (int argc, char **argv) {

    char texto[NROWS][NCOLS] = {""},
        buffer[MAXC] = "";
    size_t n = 0;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    while (n < NROWS && fgets (buffer, MAXC, fp)) {
        size_t len = strlen (buffer);
        if (len && buffer[len-1] == '\n')
            buffer[--len] = 0;
        else
            if (len == MAXC-1) {
                fprintf (stderr, "error: line %zu too long.\n", ++n);
                /* remove remaining chars in line before next read */
                while (fgets (buffer, MAXC, fp)) {}
            }
        if (sscanf (buffer, "%44[^\n]", texto[n]) == 1)
            n++;
    }
    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    for (size_t i = 0; i < n; i++)  /* output lines stored */
        printf ("texto[%2zu]: '%s'\n", i, texto[i]);

    return 0;
}

(the output is the same)

character-oriented Input

The only method left is a character-oriented approach (which can be a very effective way of reading the file character-by-character). The only challenge with a character-oriented approach is tracking the indexes on a character-by-character basis. The approach here is simple. Just repeatedly call fgetc filling the available characters in texto and then discarding any additional characters in the line until the '\n' or EOF is reached. It can actually provide a simpler, but equally robust solution compared to a line-oriented approach in the right circumstance. I'll leave investigating this approach to you.

The key in any input task in C is matching the right set of tools with the job. If you are guaranteed that the input file has a fixed format that never deviates, then formatted-input can be effective. For all other input, (including user input), line-oriented input is generally recommended because of it's ability to read a full line without leaving a '\n' dangling in the input buffer unread -- provided you use an adequately sized buffer. Character-oriented input can always be used, but you have the added challenge of keeping track of indexing on a character-by-character basis. Using all three is the only way to develop an understanding of which is the best tool for the job.

Look things over and let me know if you have further questions.

Upvotes: 2

BladeMight
BladeMight

Reputation: 2810

You are assigning using malloc on fixed array, that is impossible, since it already has fixed size. You should define the texto as char* in order to use malloc. The purpose of malloc is to allocate memory, memory allocation on fixed arrays - not possible.

Here is example of how to read the text file in 2D array:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv) {
    char texto[256][256]; // 256 - Big enough array, or use malloc for dynamic array size
    char ch;
    int count = 0;
    FILE *f = fopen("texto.txt", "r");

    if(f == NULL)
        printf("ERRO ao abrir o ficheiro para leitura");

    while((ch = fgetc(f) != EOF)) {
        count++;
        // rewind(f);
        int tamanho = count;
        // texto[count] = malloc(tamanho *sizeof(char));
        fscanf(f, "%s", &texto[count]);
    }
    // Now lets print all in reverse way.
    for (int i = count; i != 0; i--) {
        printf("%s, ", texto[i]);
    }
    return (0);
}

output:

ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, orem,

Upvotes: 0

Related Questions