Reputation: 151
While researching on file I/O in C, I came across two functions: fgetc()
and read()
.
//code1.c
#include <stdio.h>
int main(void)
{
char ch;
ch = fgetc(stdin);
return 0;
}
//code2.c
#include <unistd.h>
int main(void)
{
char ch;
read(STDIN_FILENO, &ch, 1);
return 0;
}
In both of the above programs, if I enter hello
:
The first one, will store the input from keyboard (stdin) in ch
and the program will just terminate. Meaning that ch
will contain h
and the remaining characters will just disappear.
In the second program, the input from keyboard will also be stored in ch
. But the remaining characters (ello
) will not disappear. Instead, they will be passed on to the terminal as a command after the program terminates.
I am not able to understand why is this happening? Is it something related to how inputs are buffered in C (and by computers in general)?
Upvotes: 14
Views: 1037
Reputation: 365487
fgetc
is a C stdio library function that uses its input buffering for FILE *stdin
.
You can use strace
to see what system calls your process makes. read
is an actual system call; the libc wrapper for it just passes its args on to the kernel. (On x86-64, by doing mov eax, 1
(__NR_read
) ; syscall
; ret
, and maybe). So your read
program will just do that one system call (after libc startup), but fgetc
has to make its own read
call to get bytes from stdin.
On my x86-64 Arch GNU/Linux system:
$ strace -o fgetc.tr ./a.out
hello<enter>
$ tail fgetc.tr
...
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
munmap(0x753744d52000, 278107) = 0
fstat(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0x9), ...}) = 0
getrandom("\x62\xf6\x57\x14\x0d\xb2\x41\x75", 8, GRND_NONBLOCK) = 8
brk(NULL) = 0x59eadd786000
brk(0x59eadd7a7000) = 0x59eadd7a7000
read(0, "hello\n", 1024) = 6
lseek(0, -5, SEEK_CUR) = -1 ESPIPE (Illegal seek)
exit_group(0) = ?
The last brk()
calls (finding the current break then moving it) might have been before main
started, or might have been allocating space on demand for the stdin
buffer on first use. Even the fstat(0, ...)
might have been part of the fgetc
, since that's also querying what kind of file it is (in this case a character device file.)
Because stdin is buffered (by default), glibc uses a 1024-byte read
system call.
fd 0
was connected to a terminal so the system-call blocked until I hit enter, because the terminal is in "cooked" mode1 and there weren't already any queued keystrokes / input. If it had been a regular file, the read
system call wouldn't stop at newlines, only EOF or the requested size. (Or if the requested size and the file were huge, at some kernel-chosen size limit for a single read.)
TTYs have a buffer so you can type even when there isn't a process blocked on a read system call. (And in "cooked" mode there's even line editing, like backspace, before the end-of-line character (normally newline) or end-of-file character (normally ctrl-D) submits the line.)
If the TTY buffer isn't empty when your process exits, those characters will still be there for the shell to read from it, since your process and the shell both have their stdin connected to the same TTY.
Footnote 1: Cooked as oppose to raw mode like your shell uses, or like an editor like vim would use. You could use stty -a < /dev/pts/9
from another terminal while your process is running vs. while you're at the shell prompt to see the different settings. Where /dev/pts/9
is the tty for the xterm or SSH session or whatever you're using. One easy way to find out the right path is ls -l /proc/self/fd
and look at the symlink names for where ls
's stdin/out/err refer to.
And BTW, stty
operates on its stdin, printing output if any on its stdout, that's why we redirect from the terminal we want to query or set.
The shell itself puts the terminal in "cooked" mode before starting a command, because that's the default environment for stuff like cat >> foo.txt
which lets you type something with line-editing into a file, or programs that print a prompt and wait to read a multi-character response. So strace
on your own program won't show ioctl system calls for that.
Upvotes: 3
Reputation: 61
fgetc(stdin)
uses buffered input, reading a full line into a buffer and returning one character at a time, while read(STDIN_FILENO, &ch, 1)
is unbuffered, reading only one character and leaving the rest for future input processing.
Upvotes: 6
Reputation: 754780
Yes, it is related to how inputs are buffered.
The standard I/O package functions for reading data (fgetc()
et al) will wait until data is made available by the terminal driver and will then read the available data from the terminal (usually a line full — with your example, the characters 'h'
, 'e'
, 'l'
, 'l'
, 'o'
, '\n'
) and the fgetc()
function will return the first, the 'h'
. Consequently, the other characters are not available to other programs.
The system call read()
will also wait for the terminal driver to make data available, but then reads only the first character, leaving the other characters available to other programs.
On POSIX-based systems, the fgetc()
function typically uses the read()
system call (indirectly) to get the data from the terminal, but it usually requests up to a buffer-full of data, which could be anywhere from 512 up to 8192 characters requested (or it could be bigger; it will usually be a power of two), but the read()
call will return with what's available. That's usually much less than a buffer-full when the input is a terminal. The rules are somewhat different when the input is a disk file, pipe or socket.
Note that the read()
system call does not add a null byte to the end of the data, so what it reads are not strings.
I've glossed over numerous details and caveats, seeking to keep my answer easy to understand while avoiding gross distortions of reality. There are ways to control the behaviour of terminals; I've described more or less what happens in the default case.
Upvotes: 11
Reputation: 386541
I'm going to start by saying what's the difference between read
and fread
.
About fread
:
fread
is a stdio library function.fread
works with "streams" (what it calls FILE *
).fread
buffers.fread
may read more than requested, storing the excess in a buffer.fread
may perform multiple system calls.fread
may block if less data than requested is available. It will return the amount of data requested unless EOF is encountered or an error occurs. (I don't know if this is guaranteed behaviour.)About read
:
read
is a unix system call.read
works with "file descriptors" (OS file handles).read
doesn't buffer.read
will not read more than requested.read
only performs one system call.read
returns immediately if data is available to be returned, even if the amount of data is less than the amount requested. (I don't know if this is guaranteed behaviour.)Where fgetc
falls into this
fgetc
is a function from the stdio library just like fread
. As such, the following are equivalent:
The following are equivalent:
int rv = fgetc( stdin );
if ( rv == EOF ) {
// Handle error or EOF.
} else {
char ch = rv;
// Do something with byte read.
}
char ch;
int rv = fread( stdin, &ch, 1 );
if ( rv == 0 ) {
// Handle error or EOF.
} else {
// Do something with byte read.
}
Because of the buffering performed by stdio functions, it's not wise to use both stdio and non-stdio functions with the same file descriptor.
About the difference in behaviour of your programs
Because fgetc
and fread
are buffering functions, they may read more than requested. This is why your program is absorbing ello\n
. The excess is stored in the stream's buffer for future calls to fgetc
and/or fread
to return. None occur before the program exits, so the ello\n
is lost.
Because read
doesn't buffer, it doesn't read more than requested, and it doesn't consume the ello\n
.
Upvotes: 8