Is it possible to replicate cat(1) with POSIX sh(1)?

Question

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html

POSIX sh(1) is capable of various file descriptor operations (equivalent to open(2), close(2) and dup(2), etc.) as well as read-ing a single line from STDIN.

So I got an impression that we can replace cat(1) with a POSIX-compliant shell script, but I haven't come up with an actual implementation. Is it really possible, or, what function of cat(1) might be missing from sh(1)? (Forget about GNU extensions for now)

Don't ask me why I want to do that. As an intellectual quiz, maybe?

rici · Accepted Answer

cat can copy any file to stdout; the file does not need to be a text file. It might include NULs, for example, and a NUL cannot be represented in a sh string. So that would definitely be a feature of cat that would be very difficult, if not impossible, to implement. [Note 1]

Other than that, you should be able to wrap a read and echo inside a while loop, although there are some tricky issues. (Accurately reproducing a non-empty file which does not end in a newline, for example.)

But, technically, echo is no more part of sh than cat is; just like cat, it is a utility which might not be present (on a non-Posix system). In practice, environments without echo are about as likely as environments without cat; if you have sh, you have a reasonable expectation of finding the standard command line utilities.

Notes

The only option accepted by a minimal Posix-compatible read is -r. However, if we had the bash implementation of read, we could copy a file character by character, even though the NUL character will never actually appear in a shell variable:
```
while IFS= read -d '' -rn1 char; do
  if [ -z "$char" ]; then printf '\0'; else printf '%s' "$char"; fi
done < "$1" > "$2"
```
Example:
```
$ printf 'foo\0bar

bye' |
> while IFS= read -d '' -rn1 char; do
>   if [ -z "$char" ]; then printf '\0'; else printf '%s' "$char"; fi
> done |
> hd
00000000  66 6f 6f 00 62 61 72 0a  0a 62 79 65              |foo.bar..bye|
0000000c
```
The complete set of options to read in that invocation is carefully crafted to work around a variety of idiosyncracies in the bash implementation:
- IFS= avoids trailing whitespace characters being removed from the result.
- -n1 causes one character to be read, up to the delimiter. Intuitively, -N1 would be more natural, since -N1 ignores the delimiter. However, read also strips NUL characters from the input. Since the intent is to store zero characters in $char if the next character is a NUL, we can avoid the problem by using -n1 and setting the delimiter to NUL, which works because the delimiter check is done before the NULs are stripped.
- -d '' sets the line delimiter character to NUL. See above.
- -r avoids having \ being interpreted in the input stream; this is the only Posix-compatible option in the set.
It should go without saying that the above is only of theoretical interest, or as an intellectual quiz as per the OP. In practice, a shell script should do no more than coordinate the work of external utilities, and the existence of Posix-compatible utilities such as cat, dd, head and tail should be sufficient for any file copying needs.

Is it possible to replicate cat(1) with POSIX sh(1)?

Answers (2)

Notes

Related Questions