Reputation: 351
I'm trying to use rsync to backup MySQL data. The tables use the MyISAM storage engine.
My expectation was that after the first rsync, subsequent rsyncs would be very fast. It turns out, if the table data was changed at all, the operation slows way down.
I did an experiment with a 989 MB MYD file containing real data:
Test 1 - recopying unmodified data
rsync -a orig.MYD copy.MYD
rsync -a orig.MYD copy.MYD
Test 2 - recopying slightly modified data
rsync -a orig.MYD copy.MYD
UPDATE table SET counter = counter + 1 WHERE id = 12345
rsync -a orig.MYD copy.MYD
What gives? Why is rsync taking forever just to copy a tiny change?
Edit: In fact, the second rsync in Test 2 takes as long as the first. rsync is apparently copying the whole file again.
Edit: Turns out when copying from local to local, --whole-file is implied. Even with --no-whole-file, the performance is still terrible.
Upvotes: 0
Views: 2388
Reputation: 62593
when doing local copies, rsync defaults to --whole-file
for a reason: it's faster than doing the checks.
rsync for local copies is a nice replacement to cp
when you have a big directory where only some files change. It'll copy those file whole; but quickly skip those not modified (just checking timestamps and filesize). For a single big file, it's no better than cp
.
Upvotes: 0
Reputation: 14149
Rsync is file based. If you found a way of doing it with a block based system then you could just backup the blocks/bytes that had changed.
LVM snapshots might be one way of doing this.
Upvotes: 0
Reputation: 662
rsync uses an algorithim where it sees if a file has changed, and then sees what parts of it changed. In a large database it is common that your changes are spread throughout a large segment of the file. This is rsync's worst case scenario.
Upvotes: 0
Reputation: 189686
rsync still has to calculate block hashes to determine what's changed. It may be that the no-modification case is a shortcut looking at file mod time / size.
Upvotes: 1