Reputation: 8033
POSIX sh(1)
is capable of various file descriptor operations (equivalent to open(2)
, close(2)
and dup(2)
, etc.) as well as read
-ing a single line from STDIN.
So I got an impression that we can replace cat(1)
with a POSIX-compliant shell script, but I haven't come up with an actual implementation. Is it really possible, or, what function of cat(1)
might be missing from sh(1)
? (Forget about GNU extensions for now)
Don't ask me why I want to do that. As an intellectual quiz, maybe?
Upvotes: 1
Views: 335
Reputation: 241861
cat
can copy any file to stdout; the file does not need to be a text file. It might include NUL
s, for example, and a NUL
cannot be represented in a sh
string. So that would definitely be a feature of cat
that would be very difficult, if not impossible, to implement. [Note 1]
Other than that, you should be able to wrap a read
and echo
inside a while
loop, although there are some tricky issues. (Accurately reproducing a non-empty file which does not end in a newline, for example.)
But, technically, echo
is no more part of sh
than cat
is; just like cat
, it is a utility which might not be present (on a non-Posix system). In practice, environments without echo
are about as likely as environments without cat
; if you have sh
, you have a reasonable expectation of finding the standard command line utilities.
The only option accepted by a minimal Posix-compatible read
is -r
. However, if we had the bash implementation of read
, we could copy a file character by character, even though the NUL
character will never actually appear in a shell variable:
while IFS= read -d '' -rn1 char; do
if [ -z "$char" ]; then printf '\0'; else printf '%s' "$char"; fi
done < "$1" > "$2"
Example:
$ printf 'foo\0bar\n\nbye' |
> while IFS= read -d '' -rn1 char; do
> if [ -z "$char" ]; then printf '\0'; else printf '%s' "$char"; fi
> done |
> hd
00000000 66 6f 6f 00 62 61 72 0a 0a 62 79 65 |foo.bar..bye|
0000000c
The complete set of options to read
in that invocation is carefully crafted to work around a variety of idiosyncracies in the bash implementation:
IFS=
avoids trailing whitespace characters being removed from the result.-n1
causes one character to be read, up to the delimiter. Intuitively, -N1
would be more natural, since -N1
ignores the delimiter. However, read
also strips NUL
characters from the input. Since the intent is to store zero characters in $char
if the next character is a NUL
, we can avoid the problem by using -n1
and setting the delimiter to NUL
, which works because the delimiter check is done before the NUL
s are stripped.-d ''
sets the line delimiter character to NUL
. See above.-r
avoids having \ being interpreted in the input stream; this is the only Posix-compatible option in the set.It should go without saying that the above is only of theoretical interest, or as an intellectual quiz as per the OP. In practice, a shell script should do no more than coordinate the work of external utilities, and the existence of Posix-compatible utilities such as cat
, dd
, head
and tail
should be sufficient for any file copying needs.
Upvotes: 7
Reputation: 531808
(This is essentially the same as @rici's answer, but with a concrete example of a file that cannot be displayed with sh
alone.)
cat
cannot be replicated using sh
alone. This is because sh
does not provide any method for moving bytes from one file to another that does not involve a shell parameter, and shell parameters cannot contain NULL bytes.
Here's a simple example:
printf 'foo\0bar\n' > tmp.txt # Create a file containing a null byte
IFS= read -r line < tmp.txt # Real that line into a variable.
echo "$line" # Only outputs "foo"
Upvotes: 1