nodakai
nodakai

Reputation: 8033

Is it possible to replicate cat(1) with POSIX sh(1)?

POSIX sh(1) is capable of various file descriptor operations (equivalent to open(2), close(2) and dup(2), etc.) as well as read-ing a single line from STDIN.

So I got an impression that we can replace cat(1) with a POSIX-compliant shell script, but I haven't come up with an actual implementation. Is it really possible, or, what function of cat(1) might be missing from sh(1)? (Forget about GNU extensions for now)

Don't ask me why I want to do that. As an intellectual quiz, maybe?

Upvotes: 1

Views: 335

Answers (2)

rici
rici

Reputation: 241861

cat can copy any file to stdout; the file does not need to be a text file. It might include NULs, for example, and a NUL cannot be represented in a sh string. So that would definitely be a feature of cat that would be very difficult, if not impossible, to implement. [Note 1]

Other than that, you should be able to wrap a read and echo inside a while loop, although there are some tricky issues. (Accurately reproducing a non-empty file which does not end in a newline, for example.)

But, technically, echo is no more part of sh than cat is; just like cat, it is a utility which might not be present (on a non-Posix system). In practice, environments without echo are about as likely as environments without cat; if you have sh, you have a reasonable expectation of finding the standard command line utilities.


Notes

  1. The only option accepted by a minimal Posix-compatible read is -r. However, if we had the bash implementation of read, we could copy a file character by character, even though the NUL character will never actually appear in a shell variable:

    while IFS= read -d '' -rn1 char; do
      if [ -z "$char" ]; then printf '\0'; else printf '%s' "$char"; fi
    done < "$1" > "$2"
    

    Example:

    $ printf 'foo\0bar\n\nbye' |
    > while IFS= read -d '' -rn1 char; do
    >   if [ -z "$char" ]; then printf '\0'; else printf '%s' "$char"; fi
    > done |
    > hd
    00000000  66 6f 6f 00 62 61 72 0a  0a 62 79 65              |foo.bar..bye|
    0000000c
    

    The complete set of options to read in that invocation is carefully crafted to work around a variety of idiosyncracies in the bash implementation:

    • IFS= avoids trailing whitespace characters being removed from the result.
    • -n1 causes one character to be read, up to the delimiter. Intuitively, -N1 would be more natural, since -N1 ignores the delimiter. However, read also strips NUL characters from the input. Since the intent is to store zero characters in $char if the next character is a NUL, we can avoid the problem by using -n1 and setting the delimiter to NUL, which works because the delimiter check is done before the NULs are stripped.
    • -d '' sets the line delimiter character to NUL. See above.
    • -r avoids having \ being interpreted in the input stream; this is the only Posix-compatible option in the set.
       

    It should go without saying that the above is only of theoretical interest, or as an intellectual quiz as per the OP. In practice, a shell script should do no more than coordinate the work of external utilities, and the existence of Posix-compatible utilities such as cat, dd, head and tail should be sufficient for any file copying needs.

Upvotes: 7

chepner
chepner

Reputation: 531808

(This is essentially the same as @rici's answer, but with a concrete example of a file that cannot be displayed with sh alone.)

cat cannot be replicated using sh alone. This is because sh does not provide any method for moving bytes from one file to another that does not involve a shell parameter, and shell parameters cannot contain NULL bytes.

Here's a simple example:

printf 'foo\0bar\n' > tmp.txt  # Create a file containing a null byte
IFS= read -r line < tmp.txt    # Real that line into a variable.
echo "$line"                   # Only outputs "foo"

Upvotes: 1

Related Questions