1w3j
1w3j

Reputation: 586

Why does the 'tail' command return the whole file content when using -c+1 or -c-1?

As the question itself, I cannot find out why tail acts that way. I have a file named myfile.txt and its content is:

 firstline
 secondline
 thirdline

So when I use:

 tail -c-1 myfile.txt

or

 tail -c+1 myfile.txt

it outputs:

 firstline
 secondline
 thirdline

man tail:

-c, --bytes = [+] NUM
output the last NUM bytes; or use -c +NUM to output starting with byte NUM of each file

Upvotes: 4

Views: 5358

Answers (1)

mklement0
mklement0

Reputation: 437052

  • tail -c+1 myfile.txt is the same as cat myfile.txt: you're telling tail to start output with the first (+1) byte (-c), in other words: the whole file.

  • tail -c-1 myfile.txt (more typically: tail -c1 myfile.txt) outputs only the last byte in myfile.txt.
    Assuming that myfile.txt is a properly formatted text file that ends with a trailing \n, and uses either a single-byte encoding such as ASCII or one that has single-byte ASCII encoding as a subset, such as UTF-8, this will output just that \n, i.e., a blank line.

To put tail's basic logic in general terms (covers both the GNU and the BSD/macOS implementation):

Note:
- tail supports additional options not discussed here - see man 1 tail.
- In all forms below, whitespace between the option (-<units>) and its option-argument (<count> / -<option> or +<index>) is optional - this applies generically to POSIX-compatible utilities.

# Output <count> units *from the end* of the input.
# The following two forms are equivalent.
tail -<units>  <count>
tail -<units> -<count>
  # This next form is equivalent to: tail -n <line-count>.
tail -<line-count>

# Output everything *starting from* 1-based unit index <index>.
# In other words: Skip <index> - 1 units at the *start* of the input.
tail -<units> +<index>
  • Option -<units> specifies the units for option-argument <count> / <index>.

    • -n refers to lines
      • Omitting -<units> and using only -<count> - note the required - prefix - is the same as -n <count>. (BSD/macOS tail also supports the form +<count> to imply -n +<count>, but GNU tail doesn't.)
    • -c refers to bytes(!) and is not UTF8-aware in either implementation.
    • BSD/macOS tail additionally supports -b for 512-byte blocks.
  • If <count> has no sign (e.g., 1), or an explicit minus (e.g., -1; never necessary), <count> units are returned from the end of the input.

  • A +-prefixed value is taken as a 1-based(!) unit index from the start of the input from which to output the input lines. In other words: +<index> means: skip <index> - 1 units at the beginning, and output all the rest; e.g., tail -n +2 outputs everything starting from - and including - the 2nd line.

  • Omitting both -<units> and <count> / -<count> / +<index> is the same as tail -n 10, meaning that tail's default behavior is to output the input's last 10 lines.


If we apply this logic to the OP's follow-up question regarding the behavior of -c+0 and -c-0:

  • -c+0 is treated the same as -c+1 and therefore outputs the entire input (same as cat): you're asking for everything starting at the "zeroth" byte, which doesn't exist, but since 0 < 1, with 1 being the first actual byte position, you still get the entire input as output.

  • -c-0 outputs nothing at all, because you're asking to return zero bytes (in other words: nothing) from the end of the input.

Upvotes: 6

Related Questions