Reputation: 9791
I want to read in a file line by line, without knowing the line length before. Here's what I got so far:
int ch = getc(file);
int length = 0;
char buffer[4095];
while (ch != '\n' && ch != EOF) {
ch = getc(file);
buffer[length] = ch;
length++;
}
printf("Line length: %d characters.", length);
char newbuffer[length + 1];
for (int i = 0; i < length; i++)
newbuffer[i] = buffer[i];
newbuffer[length] = '\0'; // newbuffer now contains the line.
I can now figure out the line length, but only for lines that are shorter than 4095 characters, plus the two char arrays seem like an awkward way of doing the task. Is there a better way to do this (I already used fgets() but got told it wasn't the best way)?
--Ry
Upvotes: 29
Views: 40741
Reputation: 14452
Consider the scanf '%m' format conversion modifier (POSIX)
char *arr = NULL ;
// Read unlimited string, terminated with newline. Similar to dynamic size fgets.
if ( fscanf(stdin, "%m[^\n]", &arr) == 1 ) {
// Do something with arr
free(arr) ;
} ;
Quoting from scanf man page:
An optional 'm' character. This is used with string conversions (%s, %c, %[), and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead, scanf() allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to a char * variable (this variable does not need to be initialized before the call). The caller should subsequently free(3) this buffer when it is no longer required
Upvotes: 1
Reputation: 455122
You can start with some suitable size of your choice and then use realloc
midway if you need more space as:
int CUR_MAX = 4095;
char *buffer = (char*) malloc(sizeof(char) * CUR_MAX); // allocate buffer.
int length = 0;
while ( (ch != '\n') && (ch != EOF) ) {
if(length ==CUR_MAX) { // time to expand ?
CUR_MAX *= 2; // expand to double the current size of anything similar.
buffer = realloc(buffer, CUR_MAX); // re allocate memory.
}
ch = getc(file); // read from stream.
buffer[length] = ch; // stuff in buffer.
length++;
}
.
.
free(buffer);
You'll have to check for allocation errors after calls to malloc
and realloc
.
Upvotes: 18
Reputation: 11
That is how i did it for stdin, if you call it like readLine(NULL, 0)
the function allocates a buffer for you with the size of 1024 and let it grow in steps of 1024. If you call the function with readLine(NULL, 10)
you get a buffer with steps of 10. If you have a buffer you can supply it with it size.
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <string.h>
char *readLine(char **line, size_t *length)
{
assert(line != NULL);
assert(length != NULL);
size_t count = 0;
*length = *length > 0 ? *length : 1024;
if (!*line)
{
*line = calloc(*length, sizeof(**line));
if (!*line)
{
return NULL;
}
}
else
{
memset(*line, 0, *length);
}
for (int ch = getc(stdin); ch != '\n' && ch != EOF; ch = getc(stdin))
{
if (count == *length)
{
*length += 2;
*line = realloc(*line, *length);
if (!*line)
{
return NULL;
}
}
(*line)[count] = (char)ch;
++count;
}
return *line;
}
Upvotes: 1
Reputation: 90015
You might want to look into Chuck B. Falconer's public domain ggets
library. If you're on a system with glibc, you probably have a (non-standard) getline
function available to you.
Upvotes: 6
Reputation: 67380
You're close. Basically you want to read chunks of data and check them for \n
characters. If you find one, good, you have an end of line. If you don't, you have to increase your buffer (ie allocate a new buffer twice the size of the first one and copy the data from the first one in the new one, then delete the old buffer and rename your new buffer as the old -- or just realloc
if you're in C) then read some more until you do find an ending.
Once you have your ending, the text from the beginning of the buffer to the \n
character is your line. Copy it to a buffer or work on it in place, up to you.
After you're ready for the next line, you can copy the "rest" of the input over the current line (basically a left shift) and fill the rest of the buffer with data from the input. You then go again until you run out of data.
This of course can be optimized, with a circular buffer for example, but this should be more than sufficient for any reasonable io-bound algorithm.
Upvotes: 1