rumpel
rumpel

Reputation: 83

Wrong result from xargs du | tail

I'm trying to calculate the size of an rsync backup before executing it:

du -hs -- $(rsync -avn --exclude --delete $source $target | grep / | grep -v " bytes/sec" | grep -v "deleting " | cut -d "/" -f2- |  awk 'NF' | awk '$0="$source/"$0')

It fails with error: Argument list too long. As far as I understand the command is bigger then the ARG_MAX limit.

I also tried xargs, but it fails too:

rsync -avn --exclude --delete $source $target | grep / | grep -v " bytes/sec" | grep -v "deleting " | cut -d "/" -f2- | awk 'NF' | awk '$0="$source/"$0'| xargs -P4 -n9999999 du -hs -- 2>/dev/null | tail -1 | awk {'print $1'}

(It ends with output 23GB but it is far more.)

My question is: Is there another way to pre-calculate the size of an rsync backup?

Edit: I've got a lot of files in my home-dir.

..
/home/$USER/foo/
/home/$USER/foo/bar
/home/$USER/test/
/home/$USER/test/test
/home/$USER/blah/
...

rsync puts me out the dirs but in the wrong format so du is not able to find them

du -hs -- | $(rsync -avn --delete $HOME /tmp)
bash: sending: Command not found.

The goal I want to archive is that du is able to find the files by pathnames without rsync-info, so I just grep the filenames and attach /home/$USER as prefix (look above) with this error:

bash: /usr/bin/du: Argument list too long

or with xargs

xargs du -hs -- $(rsync -avn --delete $HOME /tmp | grep / | grep -v " bytes/sec" | grep -v "deleting " | cut -d "/" -f2- |  awk 'NF' | awk '$0="/home/MyUser/"$0' )

bash: /usr/bin/xargs: Argument list too long

If I try a subfolder, containig fewer files, it works and du gives me the filesizes of the data. Trying the same with xargs it gives me - nothing (blinking cursor- no output).

xargs du -hs -- $(rsync -avn $HOME/foo /tmp | grep / | grep -v " bytes/sec" | grep -v "deleting " | cut -d "/" -f2- |  awk 'NF' | awk '$0="/home/MyUser/foo/"$0' )

Upvotes: 3

Views: 129

Answers (2)

tripleee
tripleee

Reputation: 189936

The reason xargs "fails" is that you discard some of its output.

To recap, when you get "argument list too long", that means the string containing your arguments was bigger than the kernel constant ARG_MAX; see e.g. Bash command line and input limit

xargs works around this by running the same command multiple times. So for example, if echo one two three four was too long, and you split it with xargs, you'd effectively run

echo one two
echo three four

You'll notice that whereas the original output would have been a single line, this reformulation produces two lines. You need to be cognizant of things like this when using xargs.

Returning to your example, you'll need to accept that du will possibly be run multiple times, and take additional steps to collect the separate results back together. You were already using Awk; so just write a better Awk script.

rsync -avn --exclude --delete "$source" "$target" |
awk -v s="$source" '
  /bytes\/sec|deleting/ { next }
  /\// { sub(/^[^/]*\//, s "/", $0);
    print } ' |
xargs -P4 du -ks -- 2>/dev/null |
awk '{ sum += $1 }
  END { print sum / 1024*1024*1024 }'

Notice also that we don't want to use du -h because then we can't predict what output units it will use. xargs could end up running it on a single tiny file in the last invocation, and then you'd get output in bytes or kilobytes or megabytes instead of gigabytes with du -h. We instead calculate the total in Awk, and then format it in gigabytes.

(You could get fancy and not hardcode gigabytes, but I'll leave that as an exercise. Perhaps see also File size in human readable format)

I also refactored your complex grep | cut / | awk into just one Awk script, though without access to the data, it's hard to be sure that it's correct. I believe I fixed a bug where presumably you wanted to add the value of the shell variable source in the output, not the static text $source. Remember, the shell and Awk are two different languages, and don't have access to each other's variables. I'll also point to useless use of grep which has a rationale for some of this refactoring.

Finally, quote your variables and don't hardcode a large -n value in xargs when you obviously can't know how many arguments you can squeeze in; it will default to the largest possible number anyway.

Upvotes: 5

rumpel
rumpel

Reputation: 83

I was finally able to solve the problem by using rsync's --stats option and grabbing the "Total transferred file size":

rsync -avn --stats --delete $source $1 | grep "Total transferred file size:" | awk {'print $5'} | tr -d '.'

It returns the difference in byte without any error.

Upvotes: 2

Related Questions