Hatshepsut
Hatshepsut

Reputation: 6662

Human-readable filesize and line count

I want a bash command that will return a table, where each row is the human-readable filesize, number of lines, and filename. The table should be sorted by filesize.

I've been trying to do this using a combination of du -hs, wc -l, and sort -h, and find.

Here's where I'm at:

find . -exec echo $(du -h {}) $(wc -l {}) \; | sort -h

Upvotes: 1

Views: 1004

Answers (3)

mklement0
mklement0

Reputation: 440132

Your approach fell short not only because the shell expanded your command substitutions ($(...)) up front, but more fundamentally because you cannot pass shell command lines directly to find:

find's -exec action can only invoke external utilities with literal arguments - the only non-literal argument supported is the {} representing the filename(s) at hand.

choroba's answer fixes your immediate problem by invoking a separate shell instance in each iteration, to which the shell command to execute is passed as a string argument (-exec bash -c '...' \;).
While this works (assuming you pass the {} value as an argument rather than embedding it in the command-line string), it is also quite inefficient, because multiple child processes are created for each input file.

(While there is a way to have find pass (typically) all input files to a (typically) single invocation of the specified external utility - namely with terminator + rather than \;, this is not an option here due to the nature of the command line passed.)

An efficient and robust[1] implementation that minimizes the number of child processes created would look like this:

Note: I'm assuming GNU utilities here, due to use of head -n -1 and sort -h.
Also, I'm limiting find's output to files only (as opposed to directories), because wc -l only works on files.

paste <(find . -type f -exec du -h {} +) <(find . -type f -exec wc -l {} + | head -n -1) |
  awk -F'\t *' 'BEGIN{OFS="\t"} {sub(" .+$", "", $3); print $1,$2,$3}' |
   sort -h -t$'\t' -k1,1
  • Note the use of -exec ... + rather than -exec ... \;, which ensures that typically all input filenames are passed to a single invocation to the external utility (if not all filenames fit on a single command line, invocations are batched efficiently to make as few calls as possible).

  • wc -l {} + invariably outputs a summary line, which head -n -1 strips away, but also outputs filenames after each line count.

  • paste combines the lines from each command (whose respective inputs are provided by a process substitution. <(...)) into a single output stream.

  • The awk command then strips the extraneous filename that stems from wc from the end of each line.

  • Finally, the sort command sorts the result by the 1st (-k1,1) tab-separated (-t$'\t') column by human-readable numbers (-h), such as the numbers that du -h outputs (e.g., 1K).


[1] As with any line-oriented processing, filenames with embedded newlines are not supported, but I do not consider this a real-world problem.

Upvotes: 1

eckes
eckes

Reputation: 10433

Ok, I tried it with find/-exec as well, but the escaping is hell. With a shell function it works pretty straight forward:

#!/bin/bash
function dir
{
    du=$(du -sh "$1" | awk '{print $1}')
    wc=$(wc -l < "$1")
    printf "%10s %10s %s\n" $du $wc "${1#./}"
}

printf "%10s %10s %s\n" "size" "lines" "name"
OIFS=$IFS; IFS=""
find . -type f -print0 | while read -r -d $'\0' f; do dir "$f"; done
IFS=$OIFS

Using basishm read it is even kind of safe by using nul terminator. The IFS is needed to avoid read to truncate trailing blanks in filenames.

BTW: $'\0' does not really work (same as '') - but it makes the intention clear.

Sample output:

      size      lines name
      156K        708 sash
       16K         64 hostname
      120K        460 netstat
       40K        110 fuser
      644K       1555 dir/bash
       28K         82 keyctl
      2.3M       8067 vim

Upvotes: 1

choroba
choroba

Reputation: 242208

The problem is that your shell interprets the $(...), so find doesn't get them. Escaping them doesn't help, either (\$\(du -h {}\)), as they become normal parameters to the commands, not command substitution.

In order to interpret them as command substitution is to call a new shell, either directly

find . -exec bash -c 'echo $(du -h {}) $(wc -l {})' \; | sort -h

or by creating a script and calling it from find.

Upvotes: 0

Related Questions