While researching on file I/O in C, I came across two functions: fgetc() and read() . //code1.c #include <stdio.h> int main(void) { char ch; ch = fgetc(stdin); return 0; } //code2.c #include <unistd.h> int main(void) { char ch; read(STDIN_FILENO, &ch, 1); return 0; } In both of the above programs, if I enter hello : The first one, will store the input from keyboard (stdin) in ch and the program will just terminate. Meaning that ch will contain h and the remaining characters will just disappear. In the second program, the input from keyboard will also be stored in ch . But the remaining characters ( ello ) will not disappear. Instead, they will be passed on to the terminal as a command after the program terminates. I am not able to understand why is this happening? Is it something related to how inputs are buffered in C (and by computers in general)?

Reputation: 151

Difference between fgetc() and read() function in C

While researching on file I/O in C, I came across two functions: fgetc() and read().

//code1.c
#include <stdio.h>

int main(void)
{
  char ch;
  
  ch = fgetc(stdin);

  return 0;
}

//code2.c
#include <unistd.h>

int main(void)
{
  char ch;

  read(STDIN_FILENO, &ch, 1);

  return 0;
}

In both of the above programs, if I enter hello:

The first one, will store the input from keyboard (stdin) in ch and the program will just terminate. Meaning that ch will contain h and the remaining characters will just disappear.
In the second program, the input from keyboard will also be stored in ch. But the remaining characters (ello) will not disappear. Instead, they will be passed on to the terminal as a command after the program terminates.

I am not able to understand why is this happening? Is it something related to how inputs are buffered in C (and by computers in general)?

Upvotes: 14

Answers (4)

Peter Cordes

Reputation: 365487

fgetc is a C stdio library function that uses its input buffering for FILE *stdin.

You can use strace to see what system calls your process makes. read is an actual system call; the libc wrapper for it just passes its args on to the kernel. (On x86-64, by doing mov eax, 1 (__NR_read) ; syscall ; ret, and maybe). So your read program will just do that one system call (after libc startup), but fgetc has to make its own read call to get bytes from stdin.

On my x86-64 Arch GNU/Linux system:

$ strace -o fgetc.tr  ./a.out
hello<enter>
$ tail fgetc.tr
...
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
munmap(0x753744d52000, 278107)          = 0
fstat(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0x9), ...}) = 0
getrandom("\x62\xf6\x57\x14\x0d\xb2\x41\x75", 8, GRND_NONBLOCK) = 8
brk(NULL)                               = 0x59eadd786000
brk(0x59eadd7a7000)                     = 0x59eadd7a7000
read(0, "hello\n", 1024)                = 6
lseek(0, -5, SEEK_CUR)                  = -1 ESPIPE (Illegal seek)
exit_group(0)                           = ?

The last brk() calls (finding the current break then moving it) might have been before main started, or might have been allocating space on demand for the stdin buffer on first use. Even the fstat(0, ...) might have been part of the fgetc, since that's also querying what kind of file it is (in this case a character device file.)

Because stdin is buffered (by default), glibc uses a 1024-byte read system call.

fd 0 was connected to a terminal so the system-call blocked until I hit enter, because the terminal is in "cooked" mode¹ and there weren't already any queued keystrokes / input. If it had been a regular file, the read system call wouldn't stop at newlines, only EOF or the requested size. (Or if the requested size and the file were huge, at some kernel-chosen size limit for a single read.)

TTYs have a buffer so you can type even when there isn't a process blocked on a read system call. (And in "cooked" mode there's even line editing, like backspace, before the end-of-line character (normally newline) or end-of-file character (normally ctrl-D) submits the line.)

If the TTY buffer isn't empty when your process exits, those characters will still be there for the shell to read from it, since your process and the shell both have their stdin connected to the same TTY.

Footnote 1: Cooked as oppose to raw mode like your shell uses, or like an editor like vim would use. You could use stty -a < /dev/pts/9 from another terminal while your process is running vs. while you're at the shell prompt to see the different settings. Where /dev/pts/9 is the tty for the xterm or SSH session or whatever you're using. One easy way to find out the right path is ls -l /proc/self/fd and look at the symlink names for where ls's stdin/out/err refer to.
And BTW, stty operates on its stdin, printing output if any on its stdout, that's why we redirect from the terminal we want to query or set.

The shell itself puts the terminal in "cooked" mode before starting a command, because that's the default environment for stuff like cat >> foo.txt which lets you type something with line-editing into a file, or programs that print a prompt and wait to read a multi-character response. So strace on your own program won't show ioctl system calls for that.

Upvotes: 3

Hrudu Shibu

Reputation: 61

fgetc(stdin) uses buffered input, reading a full line into a buffer and returning one character at a time, while read(STDIN_FILENO, &ch, 1) is unbuffered, reading only one character and leaving the rest for future input processing.

Upvotes: 6

Jonathan Leffler

Reputation: 754780

Yes, it is related to how inputs are buffered.

The standard I/O package functions for reading data (fgetc() et al) will wait until data is made available by the terminal driver and will then read the available data from the terminal (usually a line full — with your example, the characters 'h', 'e', 'l', 'l', 'o', '\n') and the fgetc() function will return the first, the 'h'. Consequently, the other characters are not available to other programs.

The system call read() will also wait for the terminal driver to make data available, but then reads only the first character, leaving the other characters available to other programs.

On POSIX-based systems, the fgetc() function typically uses the read() system call (indirectly) to get the data from the terminal, but it usually requests up to a buffer-full of data, which could be anywhere from 512 up to 8192 characters requested (or it could be bigger; it will usually be a power of two), but the read() call will return with what's available. That's usually much less than a buffer-full when the input is a terminal. The rules are somewhat different when the input is a disk file, pipe or socket.

Note that the read() system call does not add a null byte to the end of the data, so what it reads are not strings.

^{I've glossed over numerous details and caveats, seeking to keep my answer easy to understand while avoiding gross distortions of reality. There are ways to control the behaviour of terminals; I've described more or less what happens in the default case.}

Upvotes: 11

ikegami

Reputation: 386541

I'm going to start by saying what's the difference between read and fread.

About fread:

fread is a stdio library function.
fread works with "streams" (what it calls FILE *).
Streams are usually associated with a file descriptor, but not always.
fread buffers.
fread may read more than requested, storing the excess in a buffer.
fread may perform multiple system calls.
fread may block if less data than requested is available. It will return the amount of data requested unless EOF is encountered or an error occurs. (I don't know if this is guaranteed behaviour.)

About read:

read is a unix system call.
read works with "file descriptors" (OS file handles).
read doesn't buffer.
read will not read more than requested.
read only performs one system call.
read returns immediately if data is available to be returned, even if the amount of data is less than the amount requested. (I don't know if this is guaranteed behaviour.)

Where fgetc falls into this

fgetc is a function from the stdio library just like fread. As such, the following are equivalent:

The following are equivalent:

int rv = fgetc( stdin );
if ( rv == EOF ) {
   // Handle error or EOF.
} else {
   char ch = rv;
   // Do something with byte read.
}

char ch;
int rv = fread( stdin, &ch, 1 );
if ( rv == 0 ) {
   // Handle error or EOF.
} else {
   // Do something with byte read.
}

Because of the buffering performed by stdio functions, it's not wise to use both stdio and non-stdio functions with the same file descriptor.

About the difference in behaviour of your programs

Because fgetc and fread are buffering functions, they may read more than requested. This is why your program is absorbing ello\n. The excess is stored in the stream's buffer for future calls to fgetc and/or fread to return. None occur before the program exits, so the ello\n is lost.

Because read doesn't buffer, it doesn't read more than requested, and it doesn't consume the ello\n.

Upvotes: 8

Difference between fgetc() and read() function in C

Answers (4)

Related Questions