Reputation: 298
I want to calculate all lines in the directory /usr/local/lib/python3.5/dist-packages/pandas
.
cd /usr/local/lib/python3.5/dist-packages/pandas
find -name '*.*' |xargs wc -l
536577 total
Write the two lines as one line.
find /usr/local/lib/python3.5/dist-packages/pandas -name '*.*' |xargs wc -l
bash output two total
number,one is 495736
,the other is 40841
,
495736 + 40841 = 536577
Why bash do not give only one total 536577
at the bottom such as find -name '*.*' |xargs wc -l
do?
Upvotes: 1
Views: 430
Reputation: 50785
POSIX xargs spec. says:
The generated command line length shall be the sum of the size in bytes of the utility name and each argument treated as strings, including a null byte terminator for each of these strings. The xargs utility shall limit the command line length such that when the command line is invoked, the combined argument and environment lists shall not exceed
{ARG_MAX}-2048
bytes.
That means; in your case, find's output does not fit in ARG_MAX‒2048 bytes, thus xargs aggregates it into 2 sets and invokes wc once for each set.
Take this pipeline for example, in an ideal world its output would be 1, but it's not.
seq 1000000 | xargs echo | wc -l
seq's output is 6888896 bytes.
$ seq 1000000 | wc -c
6888896
My environment list take up 558 bytes (ignoring that _
is dynamic and whether the implementation takes terminating null pointers into consideration for the sake of clarity).
$ env | wc -c
558
ARG_MAX
on my system is 131072 bytes.
$ getconf ARG_MAX
131072
Now xargs have 131072‒2048‒558 = 128466 bytes; echo
plus null delimiter takes up 5 bytes, so a space of 128461 bytes is left. Therefore we can say, xargs will have to invoke echo
6888896/128461 = ~54 times. Let's see if that's the case:
$ seq 1000000 | xargs echo | wc -l
54
Yes, it is.
Upvotes: 5
Reputation: 52529
You can deal with xargs
running the command multiple times by adding an awk
bit to the pipeline:
find wherever -name "*.*" -type f -print0 | \
xargs -0 wc -l | \
awk '$2 == "total" { total += $1 } END { print "Overall total", total } 1'
(Assuming GNU find
and xargs
or other implementations that understand -print0
and -0
respectively; otherwise filenames with spaces etc. in them can cause problems).
GNU find
and maybe other implementations can skip the xargs
, actually:
find wherever -name "*.*" -type f -exec wc -l '{}' '+'
will have the same effect as using xargs
to run wc
on multiple files at a time.
Upvotes: 0