Reputation:
I'm trying to read an entire text file into a 2D array, so I can limit how much it can be stored and to know when to do a new line (if anyone has a better idea, I'm open to suggestions).
This is what I have so far:
int main(int argc, char** argv) {
char texto[15][45];
char ch;
int count = 0;
FILE *f = fopen("texto.txt", "r");
if(f == NULL)
printf("ERRO ao abrir o ficheiro para leitura");
while((ch = fgetc(f) != EOF))
count++;
rewind(f);
int tamanho = count;
texto = malloc(tamanho *sizeof(char));
fscanf(f, "%s", texto);
fclose(f);
printf("%s", texto);
return (EXIT_SUCCESS);
}
And the text file is like this
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
But I get this error
error: assignment to expression with array type
here
texto = malloc(tamanho *sizeof(char));
Upvotes: 0
Views: 190
Reputation: 84561
The problem you are tasked with is one of forcing you to understand the differences and limitations between character-oriented input, formatted-input, and line-oriented input. You are setting your array limits as:
char texto[15][45];
Above declares an array of 15-1D arrays containing 45 characters each which will be sequential in memory (the definition of an array
). That means at each index texto[0] - texto[14]
you can store at most 45
characters (or a string of 44
characters followed by the nul-terminating character).
You are then given a file of seven line of 45
characters each. But there are only 44
characters in each line? -- wrong. Since (presumably given "texto.txt"
) the information is held within a text file, there will be an additional '\n'
(newline) character at the end of each line. You must account for its presence in reading the file. Each line in the file will look something like the following:
10 20 30 40
123456789012345678901234567890123456789012345
lorem ipsum lorem ipsum lorem ipsum lorem ip\n
(where the numbers simply represent a scale showing how many characters are present in each line)
The ASCII '\n'
character is a single-character.
The formatted-input Approach
Can you read the input with fscanf
using the "%s"
conversion specifier? (Answer: no) Why? The "%s"
conversion specifier stops reading when it encounters the first whitespace characters after reading non-whitespace characters. That means reading with fscanf (fp, "%s", ...)
will stop reading after the 5th character.
While you can remedy this by using the character-class conversion specifier of the form [...]
where the brackets contains characters to be included (or excluded if the first character in the class is '^'
), you leave the '\n'
character unread in your input stream.
While you can remedy that by using the '*'
assignment-suppression character to read and discard the next character (the newline) with "%*c"
, if you have any additional characters in the line, they too will remain in the input buffer (input stream, e.g. your file) unread.
Are you beginning to get the picture that doing file input with the scanf
family of functions is inherently fragile? (you would be right)
A naive implementation using fscanf
could be:
#include <stdio.h>
#define NROWS 15 /* if you need a constant, #define one (or more) */
#define NCOLS 45
int main (int argc, char **argv) {
char texto[NROWS][NCOLS] = {""};
size_t n = 0;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
/* read up to NROWS lines of 44 char each with at most 1 trailing char */
while (n < NROWS && fscanf (fp, "%44[^\n]%*c", texto[n]) == 1)
n++; /* increment line count */
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < n; i++) /* output lines stored */
printf ("texto[%2lu]: '%s'\n", i, texto[i]);
return 0;
}
(note: if you can guarantee that your input file format is fixed and never varies, then this can be an appropriate approach. However, a single additional stray character in the file can torpedo this approach)
Example Use/Output
$ ./bin/texto2dfscanf <dat/texto.txt
texto[ 0]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 1]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 2]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 3]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 4]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 5]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 6]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
line-oriented Input
A better approach is always a line-oriented approach. Why? It allows you to separately validate the read of a line of data from your file (or from the user) and then validate parsing the necessary information from that line.
But there is an intentional catch in the sizing of texto
that complicates a simplistic line-oriented approach. While you may be tempted to simply attempting reading each line of text into texto[0-14]
, you would only be reading the text into texto
and leaving the '\n'
unread. (What? I thought line-oriented input handles this? -- It does if you provide sufficient space in the buffer you are trying to fill...)
Line-oriented input functions (fgets
and POSIX getline
) read and include the trailing '\n'
into the buffer being filled -- provided there is sufficient space. If using fgets
, fgets
will read no more characters than specified into the buffer (which provides protection of your array bounds). Your task here has been designed to require reading of 46
characters with a line oriented function in order to read:
the text + '\n' + '\0'
(the text plus the newline plus the nul-terminating character)
This forces you to do line-oriented input properly. Read the information into a buffer of sufficient size to handle the largest anticipated input line (and don't skimp on buffer size). Validate your read succeeded. And then parse the information you need from the line using any manner you choose (sscanf
is fine in this case). By doing it in this two-step manner, you can read the line, determine the original length of the line read (including the '\n'
) and validate whether it all fit in your buffer. You can then parse the 44
characters (plus room for the nul-terminating characters).
Further, if additional characters remain unread, you know that up-front and can then continually read and discard the remaining characters in preparation for your next read.
A reasonable line-oriented approach could look something like the following:
#include <stdio.h>
#include <string.h>
#define NROWS 15 /* if you need a constant, #define one (or more) */
#define NCOLS 45
#define MAXC 1024
int main (int argc, char **argv) {
char texto[NROWS][NCOLS] = {""},
buffer[MAXC] = "";
size_t n = 0;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (n < NROWS && fgets (buffer, MAXC, fp)) {
size_t len = strlen (buffer);
if (len && buffer[len-1] == '\n')
buffer[--len] = 0;
else
if (len == MAXC-1) {
fprintf (stderr, "error: line %zu too long.\n", ++n);
/* remove remaining chars in line before next read */
while (fgets (buffer, MAXC, fp)) {}
}
if (sscanf (buffer, "%44[^\n]", texto[n]) == 1)
n++;
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < n; i++) /* output lines stored */
printf ("texto[%2zu]: '%s'\n", i, texto[i]);
return 0;
}
(the output is the same)
character-oriented Input
The only method left is a character-oriented approach (which can be a very effective way of reading the file character-by-character). The only challenge with a character-oriented approach is tracking the indexes on a character-by-character basis. The approach here is simple. Just repeatedly call fgetc
filling the available characters in texto
and then discarding any additional characters in the line until the '\n'
or EOF
is reached. It can actually provide a simpler, but equally robust solution compared to a line-oriented approach in the right circumstance. I'll leave investigating this approach to you.
The key in any input task in C is matching the right set of tools with the job. If you are guaranteed that the input file has a fixed format that never deviates, then formatted-input can be effective. For all other input, (including user input), line-oriented input is generally recommended because of it's ability to read a full line without leaving a '\n'
dangling in the input buffer unread -- provided you use an adequately sized buffer. Character-oriented input can always be used, but you have the added challenge of keeping track of indexing on a character-by-character basis. Using all three is the only way to develop an understanding of which is the best tool for the job.
Look things over and let me know if you have further questions.
Upvotes: 2
Reputation: 2810
You are assigning using malloc
on fixed array, that is impossible, since it already has fixed size. You should define the texto
as char*
in order to use malloc
. The purpose of malloc
is to allocate memory, memory allocation on fixed arrays - not possible.
Here is example of how to read the text file in 2D array:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
char texto[256][256]; // 256 - Big enough array, or use malloc for dynamic array size
char ch;
int count = 0;
FILE *f = fopen("texto.txt", "r");
if(f == NULL)
printf("ERRO ao abrir o ficheiro para leitura");
while((ch = fgetc(f) != EOF)) {
count++;
// rewind(f);
int tamanho = count;
// texto[count] = malloc(tamanho *sizeof(char));
fscanf(f, "%s", &texto[count]);
}
// Now lets print all in reverse way.
for (int i = count; i != 0; i--) {
printf("%s, ", texto[i]);
}
return (0);
}
output:
ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, orem,
Upvotes: 0