Cyker
Cyker

Reputation: 10894

bash: read discards terminal line input after 4096 bytes

To demonstrate this problem, paste a long string (>4096 bytes) after running this command in Linux:

read foo && wc -c <<<"$foo"

The result is 4096, which means the input is truncated.

Some research revealed there is a terminal line buffer size hardcoded to 4096, which explains the truncation. However, when I tried to read with the -n option, it worked:

read -n 32768 foo && wc -c <<<"$foo"

The result is the actual length of input (+1, but it's due to here-string) rather than 4096.

So I'd like to know what is the magic with the option -n 32768. I didn't find related information in bash man page about this. Is this a feature we can rely on?

Upvotes: 7

Views: 1497

Answers (2)

rici
rici

Reputation: 241671

Bash's implementation of read lets you specify a maximum number of characters to read, using the -n flag, or an alternative termination character, using the -d flag. Neither of those options would work with standard terminal input, because normally the terminal driver keeps the input in its own internal buffer until the user types the ENTER key (or certain other keystrokes, like Control-C or Control-D).

The idea behind, for example, read -n1 char is that you want the read to return as soon as the user types a single character, not that you want the read to wait for the user to type a complete line and then return the first character of that line. Similarly, the command read -d';' command should return as soon as the user types a semicolon; again, waiting for the user to type a complete line and then just returning the part of it up to the semicolon would be unexpected.

So in order for these options to work as expected, the read builtin needs to tell the terminal driver to return characters as soon as they are typed. If the input device is a terminal, and you specify a maximum input length or a delimiter character other than newline, read puts the terminal into "raw" mode, by modifying the following termios flags:

off: ICANON INLCR OCRNL ONOCR ONLRET
on: ISIG IEXTEN ICRNL OPOST ONLCR

With ICANON turned off, the terminal driver no longer buffers input.

As noted in the original post, the Linux kernel driver uses a fixed-length 4096 input buffer in order to implement line editing, and it will simply ignore typed characters which don't fit in this buffer. So with the terminal in normal input mode, your input will be truncated after 4096 characters. With ICANON turned off, the driver passes characters through as soon as possible and input is not truncated.

But a side-effect of turning off input canonicalisation is that the terminal driver no longer interprets the backspace and delete keys, making line-editing impossible. You can try this:

# I typed a, x, backspace, b, return
$ read -n 4 input
ax^?b
$ printf "%s" "$input" | hd
00000000  61 62 7f 78                                       |ab.x|
00000004

Note that the delete character sent by the backspace key (0x7f) is retained in the input.

That's a less than ideal user experience; you certainly wouldn't want it for typing long inputs. In most cases, people expect backspace to "work". However, it's perfect for writing little console games where the script needs to react to every keystroke as it's typed.

Bash itself uses the readline library to read input. readline also puts the terminal into raw mode, but unlike the read builtin it actually handles backspace characters, arrow keys, and a large list of other characters, including lots of special characters which the kernel driver obviously knows nothing about, like tab completion and history searching.

The read builtin also has the -e flag, which causes it to use readline (if it is reading from a terminal). Doing the above experiment with -e produces possibly more convenient results:

# I typed a, x, backspace, b, c, d
$ read -en4 input
abcd
$ printf "%s" "$input" | hd
00000000  61 62 63 64                                       |abcd|
00000004

This time, readline handled the backspace, and the read returned after four "real" characters were typed.

Upvotes: 4

Shawn
Shawn

Reputation: 52336

From the bash manpage section on read:

-n nchars

read returns after reading nchars characters rather than waiting for a complete line of input, but honors a delimiter if fewer than nchars characters are read before the delimiter.

(I'm pretty sure this is a bash specific extension and not something you can rely on if using other shells unless you verify that a particular one also supports it). Edit: zsh, for example, does something very different with -n.

Upvotes: -1

Related Questions