Reputation: 83
I'm trying to calculate the size of an rsync backup before executing it:
du -hs -- $(rsync -avn --exclude --delete $source $target | grep / | grep -v " bytes/sec" | grep -v "deleting " | cut -d "/" -f2- | awk 'NF' | awk '$0="$source/"$0')
It fails with error: Argument list too long. As far as I understand the command is bigger then the ARG_MAX limit.
I also tried xargs, but it fails too:
rsync -avn --exclude --delete $source $target | grep / | grep -v " bytes/sec" | grep -v "deleting " | cut -d "/" -f2- | awk 'NF' | awk '$0="$source/"$0'| xargs -P4 -n9999999 du -hs -- 2>/dev/null | tail -1 | awk {'print $1'}
(It ends with output 23GB but it is far more.)
My question is: Is there another way to pre-calculate the size of an rsync backup?
Edit: I've got a lot of files in my home-dir.
..
/home/$USER/foo/
/home/$USER/foo/bar
/home/$USER/test/
/home/$USER/test/test
/home/$USER/blah/
...
rsync puts me out the dirs but in the wrong format so du is not able to find them
du -hs -- | $(rsync -avn --delete $HOME /tmp)
bash: sending: Command not found.
The goal I want to archive is that du is able to find the files by pathnames without rsync-info, so I just grep the filenames and attach /home/$USER as prefix (look above) with this error:
bash: /usr/bin/du: Argument list too long
or with xargs
xargs du -hs -- $(rsync -avn --delete $HOME /tmp | grep / | grep -v " bytes/sec" | grep -v "deleting " | cut -d "/" -f2- | awk 'NF' | awk '$0="/home/MyUser/"$0' )
bash: /usr/bin/xargs: Argument list too long
If I try a subfolder, containig fewer files, it works and du gives me the filesizes of the data. Trying the same with xargs it gives me - nothing (blinking cursor- no output).
xargs du -hs -- $(rsync -avn $HOME/foo /tmp | grep / | grep -v " bytes/sec" | grep -v "deleting " | cut -d "/" -f2- | awk 'NF' | awk '$0="/home/MyUser/foo/"$0' )
Upvotes: 3
Views: 129
Reputation: 189936
The reason xargs
"fails" is that you discard some of its output.
To recap, when you get "argument list too long", that means the string containing your arguments was bigger than the kernel constant ARG_MAX
; see e.g. Bash command line and input limit
xargs
works around this by running the same command multiple times. So for example, if echo one two three four
was too long, and you split it with xargs
, you'd effectively run
echo one two
echo three four
You'll notice that whereas the original output would have been a single line, this reformulation produces two lines. You need to be cognizant of things like this when using xargs
.
Returning to your example, you'll need to accept that du
will possibly be run multiple times, and take additional steps to collect the separate results back together. You were already using Awk; so just write a better Awk script.
rsync -avn --exclude --delete "$source" "$target" |
awk -v s="$source" '
/bytes\/sec|deleting/ { next }
/\// { sub(/^[^/]*\//, s "/", $0);
print } ' |
xargs -P4 du -ks -- 2>/dev/null |
awk '{ sum += $1 }
END { print sum / 1024*1024*1024 }'
Notice also that we don't want to use du -h
because then we can't predict what output units it will use. xargs
could end up running it on a single tiny file in the last invocation, and then you'd get output in bytes or kilobytes or megabytes instead of gigabytes with du -h
. We instead calculate the total in Awk, and then format it in gigabytes.
(You could get fancy and not hardcode gigabytes, but I'll leave that as an exercise. Perhaps see also File size in human readable format)
I also refactored your complex grep | cut / | awk
into just one Awk script, though without access to the data, it's hard to be sure that it's correct. I believe I fixed a bug where presumably you wanted to add the value of the shell variable source
in the output, not the static text $source
. Remember, the shell and Awk are two different languages, and don't have access to each other's variables. I'll also point to useless use of grep
which has a rationale for some of this refactoring.
Finally, quote your variables and don't hardcode a large -n
value in xargs
when you obviously can't know how many arguments you can squeeze in; it will default to the largest possible number anyway.
Upvotes: 5
Reputation: 83
I was finally able to solve the problem by using rsync's --stats option and grabbing the "Total transferred file size":
rsync -avn --stats --delete $source $1 | grep "Total transferred file size:" | awk {'print $5'} | tr -d '.'
It returns the difference in byte without any error.
Upvotes: 2