Hemmelig
Hemmelig

Reputation: 794

Using fgets() without predefined buffer

I need to ask one more question about reading from the stdin. I am reading a huge trunk of lines from the stdin, but it is definitely unknown which is the size of every line. So I don't want to have a buffer like 50Mio just for a file having lines of three char and than a file using these 50 Mio per line. So at the moment I am having this code:

int cur_max = 2047;
char *str = malloc(sizeof(char) * cur_max);
int length = 0;

while(fgets(str, sizeof(str), stdin) != NULL) {
    //do something with str
    //for example printing
    printf("%s",str);
}

free(str);

So I am using fgets for every line and I do have a first size of 2047 char per line. My plan is to increase the size of the buffer (str) when a line hits the limit. So my idea is to count the size with length and if the current length is bigger than cur_max then I am doubling the cur_max. The idea comes from here Read line from file without knowing the line length I am currently not sure how to do this with fgets because I think fgets is not doing this char by char so I don't know the moment when to increase the size.

Upvotes: 1

Views: 2418

Answers (2)

chux
chux

Reputation: 153338

Incorrect code

sizeof(str) is the size of a pointer, like 2, 4 or 8 bytes. Pass to fgets() the size of the memory pointed to by str. @Andrew Henle @Steve Summit

char *str = malloc(sizeof(char) * cur_max);
...
// while(fgets(str, sizeof(str), stdin) != NULL
while(fgets(str, cur_max, stdin) != NULL

Environmental limits

Text files and fgets() are not the portable solution for reading excessively long lines.

An implementation shall support text files with lines containing at least 254 characters, including the terminating new-line character. The value of the macro BUFSIZ shall be at least 256 C11 §7.21.2 9

So once the line length exceeds BUFSIZ - 2, code is on its own as to if the C standard library functions can handle a text file.

So either read the data as binary, use other libraries that insure the desired functionality, or rely on hope.

Note: BUFSIZ defined in <stdio.h>

Upvotes: 2

Nominal Animal
Nominal Animal

Reputation: 39298

POSIX.1 getline() (man 3 getline) is available in almost all operating systems' C libraries (the only exception I know of is Windows). A loop to read lines of any length is

char    *line_ptr = NULL;
size_t   line_max = 0;
ssize_t  line_len;

while (1) {

    line_len = getline(&line_ptr, &line_max, stdin);
    if (line_len == -1)
        break;

    /* You now have 'line_len' chars at 'line_ptr',
       but it may contain embedded nul chars ('\0').
       Also, line_ptr[line_len] == '\0'.
    */
}

/* Discard dynamically allocated buffer; allow reuse later. */
free(line_ptr);
line_ptr = NULL;
line_max = 0;

There is also a related function getdelim(), that takes an extra parameter (specified before the stream), used as an end-of-record marker. It is particularly useful in Unixy/POSIXy environments when reading file names from e.g. standard input, as you can use nul itself ('\0') as the separator (see e.g. find -print0 or xargs -0), allowing correct handling for all possible file names.

If you use Windows, or if you have text files with varying newline conventions (not just '\n', but any of '\n', '\r', "\r\n", or "\n\r"), you can use my getline_universal() function implementation from another of my answers. It differs from standard getline() and fgets() in that the newline is not included in the line it returns; it is also left in the stream and consumed/ignored by the next call to getline_universal(). If you use getline_universal() to read each line in a file or stream, it will work as expected.

Upvotes: 1

Related Questions