showkey
showkey

Reputation: 298

xargs wc -l reports two totals

I want to calculate all lines in the directory /usr/local/lib/python3.5/dist-packages/pandas.

cd /usr/local/lib/python3.5/dist-packages/pandas
find  -name '*.*' |xargs  wc -l
536577 total

Write the two lines as one line.

 find  /usr/local/lib/python3.5/dist-packages/pandas  -name '*.*' |xargs wc -l    

enter image description here

bash output two total number,one is 495736 ,the other is 40841,

495736 + 40841 = 536577

Why bash do not give only one total 536577 at the bottom such as find -name '*.*' |xargs wc -l do?

Upvotes: 1

Views: 430

Answers (2)

oguz ismail
oguz ismail

Reputation: 50785

POSIX xargs spec. says:

The generated command line length shall be the sum of the size in bytes of the utility name and each argument treated as strings, including a null byte terminator for each of these strings. The xargs utility shall limit the command line length such that when the command line is invoked, the combined argument and environment lists shall not exceed {ARG_MAX}-2048 bytes.

That means; in your case, find's output does not fit in ARG_MAX‒2048 bytes, thus xargs aggregates it into 2 sets and invokes wc once for each set.


Take this pipeline for example, in an ideal world its output would be 1, but it's not.

seq 1000000 | xargs echo | wc -l

seq's output is 6888896 bytes.

$ seq 1000000 | wc -c
6888896

My environment list take up 558 bytes (ignoring that _ is dynamic and whether the implementation takes terminating null pointers into consideration for the sake of clarity).

$ env | wc -c
558

ARG_MAX on my system is 131072 bytes.

$ getconf ARG_MAX
131072

Now xargs have 131072‒2048‒558 = 128466 bytes; echo plus null delimiter takes up 5 bytes, so a space of 128461 bytes is left. Therefore we can say, xargs will have to invoke echo 6888896/128461 = ~54 times. Let's see if that's the case:

$ seq 1000000 | xargs echo | wc -l
54

Yes, it is.

Upvotes: 5

Shawn
Shawn

Reputation: 52529

You can deal with xargs running the command multiple times by adding an awk bit to the pipeline:

find wherever -name "*.*" -type f -print0 | \
xargs -0 wc -l | \
awk '$2 == "total" { total += $1 } END { print "Overall total", total } 1'

(Assuming GNU find and xargs or other implementations that understand -print0 and -0 respectively; otherwise filenames with spaces etc. in them can cause problems).

GNU find and maybe other implementations can skip the xargs, actually:

find wherever -name "*.*" -type f -exec wc -l '{}' '+'

will have the same effect as using xargs to run wc on multiple files at a time.

Upvotes: 0

Related Questions