Reputation: 794
I need to ask one more question about reading from the stdin. I am reading a huge trunk of lines from the stdin, but it is definitely unknown which is the size of every line. So I don't want to have a buffer like 50Mio just for a file having lines of three char and than a file using these 50 Mio per line. So at the moment I am having this code:
int cur_max = 2047;
char *str = malloc(sizeof(char) * cur_max);
int length = 0;
while(fgets(str, sizeof(str), stdin) != NULL) {
//do something with str
//for example printing
printf("%s",str);
}
free(str);
So I am using fgets for every line and I do have a first size of 2047 char per line. My plan is to increase the size of the buffer (str) when a line hits the limit. So my idea is to count the size with length and if the current length is bigger than cur_max then I am doubling the cur_max. The idea comes from here Read line from file without knowing the line length I am currently not sure how to do this with fgets because I think fgets is not doing this char by char so I don't know the moment when to increase the size.
Upvotes: 1
Views: 2418
Reputation: 153338
sizeof(str)
is the size of a pointer, like 2, 4 or 8 bytes. Pass to fgets()
the size of the memory pointed to by str
. @Andrew Henle @Steve Summit
char *str = malloc(sizeof(char) * cur_max);
...
// while(fgets(str, sizeof(str), stdin) != NULL
while(fgets(str, cur_max, stdin) != NULL
Text files and fgets()
are not the portable solution for reading excessively long lines.
An implementation shall support text files with lines containing at least 254 characters, including the terminating new-line character. The value of the macro
BUFSIZ
shall be at least 256 C11 §7.21.2 9
So once the line length exceeds BUFSIZ - 2
, code is on its own as to if the C standard library functions can handle a text file.
So either read the data as binary, use other libraries that insure the desired functionality, or rely on hope.
Note: BUFSIZ
defined in <stdio.h>
Upvotes: 2
Reputation: 39298
POSIX.1 getline()
(man 3 getline
) is available in almost all operating systems' C libraries (the only exception I know of is Windows). A loop to read lines of any length is
char *line_ptr = NULL;
size_t line_max = 0;
ssize_t line_len;
while (1) {
line_len = getline(&line_ptr, &line_max, stdin);
if (line_len == -1)
break;
/* You now have 'line_len' chars at 'line_ptr',
but it may contain embedded nul chars ('\0').
Also, line_ptr[line_len] == '\0'.
*/
}
/* Discard dynamically allocated buffer; allow reuse later. */
free(line_ptr);
line_ptr = NULL;
line_max = 0;
There is also a related function getdelim()
, that takes an extra parameter (specified before the stream), used as an end-of-record marker. It is particularly useful in Unixy/POSIXy environments when reading file names from e.g. standard input, as you can use nul itself ('\0'
) as the separator (see e.g. find -print0
or xargs -0
), allowing correct handling for all possible file names.
If you use Windows, or if you have text files with varying newline conventions (not just '\n'
, but any of '\n'
, '\r'
, "\r\n"
, or "\n\r"
), you can use my getline_universal()
function implementation from another of my answers. It differs from standard getline()
and fgets()
in that the newline is not included in the line it returns; it is also left in the stream and consumed/ignored by the next call to getline_universal()
. If you use getline_universal()
to read each line in a file or stream, it will work as expected.
Upvotes: 1