Reputation: 454
Considering these 3 lines in a file:
This is the first line of a text.
Second line comes next.
File ends here.
I want to read those lines and store them in an array. The problem is I don't know how long they are in order to malloc
the space needed.
To the example given their length is quite small, but consider there are also very large lines.
I don't want to malloc
1000 bytes and define it as max length of a string. So is there any way I can find out the length of every line in order to malloc
the appropriate space?
Note: I have considered using realloc
but isn't that a bad technique when the string is very long?
Upvotes: 1
Views: 371
Reputation: 84521
The standard approach for reading an unknown number of lines of unknown length from file -- and allocating only the storage required, is to allocate some reasonably anticipated number of pointers initially (using a pointer-to-pointer-to-char, e.g. a double-pointer, char **lines;
) and then read and allocate for each line and assign the memory address holding the line to the allocated pointers in a sequential manner until you reach the limit of the number of pointers you have allocated, you then realloc
the number of pointers (generally to twice the current) and keep going, repeat as required.
While you can use fgets
, if you have POSIX getline
available, it will handle the read of any line regardless of the length using its internal allocation, making your only job one of allocating a copy of the line and assigning that address to your next pointer. strdup
makes that a snap, but if not, getline
returns the number of characters it has read (e.g. nchr = getline (&line, &n, fp);
making it a simple task of char *buf = malloc (nchr + 1); strcpy (buf, line);
) in the event strdup
is not available.
A short example, including the necessary validations would be:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NPTR 8
int main (int argc, char **argv) {
size_t ndx = 0, /* line index */
nptrs = NPTR, /* initial number of pointers */
n = 0; /* line alloc size (0, getline decides) */
ssize_t nchr = 0; /* return (no. of chars read by getline) */
char *line = NULL, /* buffer to read each line */
**lines = NULL; /* pointer to pointer to each line */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
/* allocate/validate initial 'nptrs' pointers */
if (!(lines = calloc (nptrs, sizeof *lines))) {
perror ("calloc-lines");
return 1;
}
/* read each line with POSIX getline */
while ((nchr = getline (&line, &n, fp)) != -1) {
if (nchr && line[nchr - 1] == '\n') /* check trailing '\n' */
line[--nchr] = 0; /* overwrite with nul-char */
char *buf = strdup (line); /* allocate/copy line */
if (!buf) { /* strdup allocates, so validate */
fprintf (stderr, "error: strdup allocation failed.\n");
break;
}
lines[ndx++] = buf; /* assign start address for buf to lines */
if (ndx == nptrs) { /* if pointer limit reached, realloc */
/* always realloc to temporary pointer, to validate success */
void *tmp = realloc (lines, sizeof *lines * nptrs * 2);
if (!tmp) { /* if realloc fails, bail with lines intact */
perror ("realloc-lines");
break; /* don't exit, lines holds current lines */
}
lines = tmp; /* assign reallocted block to lines */
/* zero all new memory (optional) */
memset (lines + nptrs, 0, nptrs * sizeof *lines);
nptrs *= 2; /* increment number of allocated pointers */
}
}
free (line); /* free memory allocated by getline */
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < ndx; i++) {
printf ("line[%3zu] : %s\n", i, lines[i]);
free (lines[i]); /* free memory for each line */
}
free (lines); /* free pointers */
return 0;
}
Example Input File
$ cat dat/3lines.txt
This is the first line of a text.
Second line comes next.
File ends here.
Example Use/Output
$ ./bin/getline_readfile <dat/3lines.txt
line[ 0] : This is the first line of a text.
line[ 1] : Second line comes next.
line[ 2] : File ends here.
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/getline_readfile <dat/3lines.txt
==12179== Memcheck, a memory error detector
==12179== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==12179== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==12179== Command: ./bin/getline_readfile
==12179==
line[ 0] : This is the first line of a text.
line[ 1] : Second line comes next.
line[ 2] : File ends here.
==12179==
==12179== HEAP SUMMARY:
==12179== in use at exit: 0 bytes in 0 blocks
==12179== total heap usage: 5 allocs, 5 frees, 258 bytes allocated
==12179==
==12179== All heap blocks were freed -- no leaks are possible
==12179==
==12179== For counts of detected and suppressed errors, rerun with: -v
==12179== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Upvotes: 2
Reputation: 93456
A simple solution on systems that support it (all modern desktop operating systems for example), is to memory-map the entire file. The file content is then memory addressable directly and the operating-systems virtual memory management handles the memory and paging for you regardless of the size of the file.
You then either operate on the file content directly as if it were memory - no explicit allocation, reallocation or (even write-back should you modify it) necessary - or get each line start and length by scanning for newline, allocate exact amount of memory required and copy it.
Windows and POSIX API's for memory mapped files differ, but you will find plenty of examples for whatever system you are using.
Upvotes: 0
Reputation: 2628
#include <stdio.h> //Used for fopen, fseek, ftell, fread, fclose
#include <stdlib.h> //Used for malloc and free
#include <assert.h> //Used for assert
int main(void)
{
FILE* file = fopen("your_file_here.txt", "r"); //Open a file
fseek(file, 0, SEEK_END); //Find the end of the file
long filesize = ftell(file); //Save the position (length)
fseek(file, 0, SEEK_SET); //Return to the beginning of the file
char* buffer = malloc(filesize + 1); //Allocate enough memory
assert(buffer); //Ensure that the memory was allocated
fread(content, 1, filesize, file); //Fill the allocated memory with the file content
//Do whatever you like
//{...}
free(buffer); //Free up memory
fclose(file); //Close file
}
Upvotes: 0
Reputation: 995
this is kind of funny considering the worry of 1000 bytes which is 1KB, system RAM is likely to be in GB's, and because of disk & file caching everything is likely to be loaded into RAM anyway. But here's a way to run through the file and find what would be the maximum number of characters on what would be considered a line
int Max_Line_Length_in_File ( FILE *fp )
{
char ch;
int count = 0;
int maxcount = -1;
/* assumes fp is already opened and at beginning of text file */
ch = fgets( fp );
while ( ! feof( fp ) )
{
if (( ch == '\n' ) || ( ch == '\0' ))
count = 0;
else
count++;
if ( count > maxcount )
maxcount = count;
}
/* don't forget to do a rewind on the fileptr if needed */
}
once you know the max length of a line, you can either do one 'malloc()' knowing the minimum value you need... or you could do it for each and every line and you could easily modify the above and add a counter for the number of lines found. So if memory is an issue, here's a way to do it with less than a handful of variables, 4 bytes for an int typically, so less than 16 bytes to get you an answer of number_of_lines and max_line_length
Upvotes: 0
Reputation: 30906
Start by dynamically allocating some memory which will denote the average length of a line on your file. Whatever it is - now read char
by char
- then when you reach the end of the malloced memory(malloc
) - reallocate using realloc
(realloc
to double the size). Then after you found a \n
- reallocate again to free those extra memory you asked for but not needed for this line. This way you can read the whole line and then you can have the necessary memory for storing each line (by using malloc
, realloc
).
And to answer your comment regarding if this is good enough - here the number of realloc
can be considerably reduced by making it allocate a large chunk first time and then shrinking it after filling it up as you need. And yes doing it many times is performance intensive but again we will double it each time. So unless initial size is too small - then it fits.
Also there is getline which can do the hard part for you but yes this is part of POSIX so using it won't lend you portability. You can check the small example provided there to understand how to work with it.
Upvotes: 3