Reputation: 6662
I want a bash command that will return a table, where each row is the human-readable filesize, number of lines, and filename. The table should be sorted by filesize.
I've been trying to do this using a combination of du -hs
, wc -l
, and sort -h
, and find
.
Here's where I'm at:
find . -exec echo $(du -h {}) $(wc -l {}) \; | sort -h
Upvotes: 1
Views: 1004
Reputation: 440132
Your approach fell short not only because the shell expanded your command substitutions ($(...)
) up front, but more fundamentally because you cannot pass shell command lines directly to find
:
find
's -exec
action can only invoke external utilities with literal arguments - the only non-literal argument supported is the {}
representing the filename(s) at hand.
choroba's answer fixes your immediate problem by invoking a separate shell instance in each iteration, to which the shell command to execute is passed as a string argument (-exec bash -c '...' \;
).
While this works (assuming you pass the {}
value as an argument rather than embedding it in the command-line string), it is also quite inefficient, because multiple child processes are created for each input file.
(While there is a way to have find
pass (typically) all input files to a (typically) single invocation of the specified external utility - namely with terminator +
rather than \;
, this is not an option here due to the nature of the command line passed.)
An efficient and robust[1] implementation that minimizes the number of child processes created would look like this:
Note: I'm assuming GNU utilities here, due to use of head -n -1
and sort -h
.
Also, I'm limiting find
's output to files only (as opposed to directories), because wc -l
only works on files.
paste <(find . -type f -exec du -h {} +) <(find . -type f -exec wc -l {} + | head -n -1) |
awk -F'\t *' 'BEGIN{OFS="\t"} {sub(" .+$", "", $3); print $1,$2,$3}' |
sort -h -t$'\t' -k1,1
Note the use of -exec ... +
rather than -exec ... \;
, which ensures that typically all input filenames are passed to a single invocation to the external utility (if not all filenames fit on a single command line, invocations are batched efficiently to make as few calls as possible).
wc -l {} +
invariably outputs a summary line, which head -n -1
strips away, but also outputs filenames after each line count.
paste
combines the lines from each command (whose respective inputs are provided by a process substitution. <(...)
) into a single output stream.
The awk
command then strips the extraneous filename that stems from wc
from the end of each line.
Finally, the sort
command sorts the result by the 1st (-k1,1
) tab-separated (-t$'\t'
) column by human-readable numbers (-h
), such as the numbers that du -h
outputs (e.g., 1K
).
[1] As with any line-oriented processing, filenames with embedded newlines are not supported, but I do not consider this a real-world problem.
Upvotes: 1
Reputation: 10433
Ok, I tried it with find/-exec as well, but the escaping is hell. With a shell function it works pretty straight forward:
#!/bin/bash
function dir
{
du=$(du -sh "$1" | awk '{print $1}')
wc=$(wc -l < "$1")
printf "%10s %10s %s\n" $du $wc "${1#./}"
}
printf "%10s %10s %s\n" "size" "lines" "name"
OIFS=$IFS; IFS=""
find . -type f -print0 | while read -r -d $'\0' f; do dir "$f"; done
IFS=$OIFS
Using basishm read it is even kind of safe by using nul terminator. The IFS is needed to avoid read to truncate trailing blanks in filenames.
BTW: $'\0'
does not really work (same as ''
) - but it makes the intention clear.
Sample output:
size lines name
156K 708 sash
16K 64 hostname
120K 460 netstat
40K 110 fuser
644K 1555 dir/bash
28K 82 keyctl
2.3M 8067 vim
Upvotes: 1
Reputation: 242208
The problem is that your shell interprets the $(...)
, so find
doesn't get them. Escaping them doesn't help, either (\$\(du -h {}\)
), as they become normal parameters to the commands, not command substitution.
In order to interpret them as command substitution is to call a new shell, either directly
find . -exec bash -c 'echo $(du -h {}) $(wc -l {})' \; | sort -h
or by creating a script and calling it from find
.
Upvotes: 0