Reputation: 448
I'm trying to read and parse a file line by line.
I only want to use simple syscalls (read
, open
, close
, ...) and not fgets
or getc
because I wish to learn, in a way, fundamentals. (I looked some answers on similar questions but they all use fgets
and such).
Here's what I have at the moment:a function I wrote that will store 1024 chars in a buffer from a file.
int main(void) {
const char *filename = "file.txt";
int fd = open(filename, O_RDONLY);
char *buffer = malloc(sizeof (char) * 1024);
read(fd, buffer, 1024);
printf("%s", buffer);
close(fd);
free(buffer);
}
How does one make a stop at a '\n' for instance?
I know that once I know where to stop, I can use lseek
with the right offset to continue reading my file where I stopped.
I do not wish to store the whole file in my buffer and then parse it. I want to add a line in my buffer, then parse that line and realloc my buffer and keep on reading the file.
I was thinking of something like this but I feel like it's badly optimized and not sure where to add the lseek
afterwards:
char *line = malloc(sizeof (char) * 1024);
read(fd, buffer, 1);
int i = 0;
while(*buffer != '\n' && *buffer != '\0'){
line[i] = *buffer;
++i;
*buffer++;
read(fd, buffer, 1); //Assuming i < 1024 and *buffer != NULL
}
/* lseek somewhere after, probably should make 2 for loops
** One loop till file isn't completly read
** Another loop inside that checks if the end of the line is reached
** At the end of second loop lseek to where we left
*/
Thanks :)
EDIT: Title for clarifications.
Upvotes: 1
Views: 9512
Reputation: 1
char *buffer = malloc(sizeof (char) * 1024);
read(fd, buffer, 1024);
printf("%s", buffer);
There are several errors in the above code.
First, malloc
is not a syscall (and neither is perror(3) ....). And sizeof(char)
is 1 by definition. If you want to only use syscalls (listed in syscalls(2)) you'll need to use mmap(2) and you should request virtual memory in multiple of the page size (see getpagesize(2) or sysconf(3)....), which is often (but not always) 4 kilobytes.
If you can use malloc
you should code against its failure and you'll better zero the obtained buffer, so at least
const int bufsiz = 1024;
char*buffer = malloc(bufsiz);
if (!buffer) { perror("malloc"); exit(EXIT_FAILURE); };
memset(buffer, 0, bufsiz);
Then, and more importantly, read(2) is returning a number that you should always use (at least against failure):
ssize_t rdcnt = read(fd, buffer, bufsiz);
if (rdcnt<0) { perror("read"); exit(EXIT_FAILURE); };
You'll generally increment some pointer (by rdcnt
bytes) if the rdcnt
is positive. A zero count means an end-of-file.
At last your printf
is using <stdio.h>
and you might use write(2)
instead. If using printf
, remember that it is buffering. Either end the format with a \n
, or use fflush(3)
If you use printf
, be sure to end the string with a zero byte. A possibility might have been to pass bufsiz-1
to your read
; since we zeroed the zone before, we are sure to have a terminating zero byte.
BTW, you could study the source code of some free software implementation of the C standard library such as musl-libc or GNU libc
Don't forget to compile with all warnings and debug info (gcc -Wall -Wextra -g
), to use the debugger (gdb
), perhaps valgrind & strace(1)
Upvotes: 2
Reputation: 726839
You are essentially implementing your own version of fgets
. Avoiding character-by-character read of non-seekable streams in fgets
is enabled by an internal buffer associated with FILE*
data structure.
Internally, fgets
uses a function to fill that buffer using "raw" input-output routines. After that, fgets
goes through the buffer character-by-character to determine the location of '\n'
, if any. Finally, fgets
copies the content from the internal buffer into the user-supplied buffer, and null-terminates the result if there is enough space.
In order to re-create this logic you would need to define your own FILE
-like struct
with a pointer to buffer and a pointer indicating the current location inside the buffer. After that you would need to define your own version of fopen
, which initializes the buffer and returns it to the caller. You would also need to write your own version of fclose
to free up the buffer. Once all of this is in place, you can implement your fgets
by following the logic outlined above.
Upvotes: 2
Reputation: 84579
If you are going to use read
to read a line at a time (what fgets
or getline
are intended to do), you must keep track of the offset within the file after you locate each '\n'
. It is then just a matter of reading a line at a time, beginning the next read
at the offset following the current.
I understand wanting to be able to use the low-level functions as well as fgets
and getline
. What you find is that you basically end up re-coding (in a less efficient way) what is already done in fgets
and getline
. But it is certainly good learning. Here is a short example:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#define BUFSZ 128
ssize_t readline (char *buf, size_t sz, char *fn, off_t *offset);
int main (int argc, char **argv) {
if (argc < 2) return 1;
char line[BUFSZ] = {0};
off_t offset = 0;
ssize_t len = 0;
size_t i = 0;
/* using open/read, read each line in file into 'line' */
while ((len = readline (line, BUFSZ, argv[1], &offset)) != -1)
printf (" line[%2zu] : %s (%zd chars)\n", i++, line, len);
return 0;
}
/* read 'sz' bytes from file 'fn' beginning at file 'offset'
storing all chars in 'buf', where 'buf' is terminated at
the first newline found. On success, returns number of
characters read, -1 on error or EOF with 0 chars read.
*/
ssize_t readline (char *buf, size_t sz, char *fn, off_t *offset)
{
int fd = open (fn, O_RDONLY);
if (fd == -1) {
fprintf (stderr, "%s() error: file open failed '%s'.\n",
__func__, fn);
return -1;
}
ssize_t nchr = 0;
ssize_t idx = 0;
char *p = NULL;
/* position fd & read line */
if ((nchr = lseek (fd, *offset, SEEK_SET)) != -1)
nchr = read (fd, buf, sz);
close (fd);
if (nchr == -1) { /* read error */
fprintf (stderr, "%s() error: read failure in '%s'.\n",
__func__, fn);
return nchr;
}
/* end of file - no chars read
(not an error, but return -1 )*/
if (nchr == 0) return -1;
p = buf; /* check each chacr */
while (idx < nchr && *p != '\n') p++, idx++;
*p = 0;
if (idx == nchr) { /* newline not found */
*offset += nchr;
/* check file missing newline at end */
return nchr < (ssize_t)sz ? nchr : 0;
}
*offset += idx + 1;
return idx;
}
Example Input
The following datafiles are identical except the second contains a blank line between each line of text.
$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
$ cat dat/captnjack2.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
Output
$ ./bin/readfile dat/captnjack.txt
line[ 0] : This is a tale (14 chars)
line[ 1] : Of Captain Jack Sparrow (23 chars)
line[ 2] : A Pirate So Brave (17 chars)
line[ 3] : On the Seven Seas. (18 chars)
$ ./bin/readfile dat/captnjack2.txt
line[ 0] : This is a tale (14 chars)
line[ 1] : (0 chars)
line[ 2] : Of Captain Jack Sparrow (23 chars)
line[ 3] : (0 chars)
line[ 4] : A Pirate So Brave (17 chars)
line[ 5] : (0 chars)
line[ 6] : On the Seven Seas. (18 chars)
Upvotes: 5