Reputation: 247
As I know the commands like
find <dir> -type f -exec rm {} \;
are not the best variant to remove large amount of files (total files, including subfolder). It works good if you have small amount of files, but if you have 10+ mlns files in subfolders, it can hang a server.
Does anyone know any specific linux commands to solve this problem?
Upvotes: 7
Views: 16378
Reputation: 646
mv large_folder /tmp/.
sudo reboot
Call to mv is fast - it just modifies labels. System reboot will clear the /tmp folder (mount it again?) in the fastest way possible.
Upvotes: 0
Reputation: 11
If you would like delete tons of files as soon as possible, try this:
find . -type f -print0 | xargs -P 0 -0 rm -f
Note the -P
option will make xargs
use processes as many as possible.
Upvotes: 0
Reputation: 69
You can create a empty directory and RSYNC it to the directory which you need to empty. You will avoid time out and memory out issue
Upvotes: -1
Reputation: 21
The previous commands are good.
rm -rf directory/
also works faster for billion of files in one folder. I tried that.
Upvotes: 1
Reputation: 175
If you have a reasonably modern version of find (4.2.3 or greater) you can use the -delete flag.
find <dir> -type f -delete
If you have version 4.2.12 or greater you can take advantage of xargs style command line stacking via the \+
-exec modifier. This way you don't run a separate copy of /bin/rm
for every file.
find <dir> -type f -exec rm {} \+
Upvotes: 0
Reputation: 16971
If you need to deal with space limit issue on a very large file tree (in my case many perforce branches), that sometimes being hanged while running the find and delete process -
Here's a script that I schedule daily to find all directories with specific file ("ChangesLog.txt"), and then Sort all directories found that are older than 2 days, and Remove the first matched directory (each schedule there could be a new match):
bash -c "echo @echo Creating Cleanup_Branch.cmd on %COMPUTERNAME% - %~dp0 > Cleanup_Branch.cmd"
bash -c "echo -n 'bash -c \"find ' >> Cleanup_Branch.cmd"
rm -f dirToDelete.txt
rem cd. > dirToDelete.txt
bash -c "find .. -maxdepth 9 -regex ".+ChangesLog.txt" -exec echo {} >> dirToDelete.txt \; & pid=$!; sleep 100; kill $pid "
sed -e 's/\(.*\)\/.*/\1/' -e 's/^./"&/;s/.$/&" /' dirToDelete.txt | tr '\n' ' ' >> Cleanup_Branch.cmd
bash -c "echo -n '-maxdepth 0 -type d -mtime +2 | xargs -r ls -trd | head -n1 | xargs -t rm -Rf' >> Cleanup_Branch.cmd"
bash -c 'echo -n \" >> Cleanup_Branch.cmd'
call Cleanup_Branch.cmd
Note the requirements:
Upvotes: 0
Reputation: 12677
I tried every one of these commands, but problem I had was that the deletion process was locking the disk, and since no other processes could access it, there was a big pileup of processes trying to access the disk making the problem worse. Run "iotop" and see how much disk IO your process is using.
Here's the python script that solved my problem. It deletes 500 files at a time, then takes a 2 second break to let the other processes do their business, then continues.
import os, os.path
import time
for root, dirs, files in os.walk('/dir/to/delete/files'):
i = 0
file_num = 0
for f in files:
fullpath = os.path.join(root, f)
i = i + 1
file_num = file_num + 1
os.remove(fullpath)
if i%500 == 1:
time.sleep(2)
print "Deleted %i files" % file_num
Hope this helps some people.
Upvotes: 0
Reputation: 38032
Here's an example bash script:
#!/bin/bash
local LOCKFILE=/tmp/rmHugeNumberOfFiles.lock
# this process gets ultra-low priority
ionice -c2 -n7 -p $$ > /dev/null
if [ $? ]; then
echo "Could not set disk IO priority. Exiting..."
exit
fi
renice +19 -p $$ > /dev/null
if [ $? ]; then
echo "Could not renice process. Exiting..."
exit
fi
# check if there's an instance running already. If so--exit
if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
echo "An instance of this script is already running."
exit
fi
# make sure the lockfile is removed when we exit. Then: claim the lock
trap "command rm -f -- $LOCKFILE; exit" INT TERM EXIT
echo $$ > $LOCKFILE
# also create a tempfile, and make sure that's removed too upon exit
tmp=$(tempfile) || exit
trap "command rm -f -- '$tmp'" INT TERM EXIT
# ----------------------------------------
# option 1
# ----------------------------------------
# find your specific files
find "$1" -type f [INSERT SPECIFIC SEARCH PATTERN HERE] > "$tmp"
cat $tmp | rm
# ----------------------------------------
# option 2
# ----------------------------------------
command rm -r "$1"
# remove the lockfile, tempfile
command rm -f -- "$tmp" $LOCKFILE
This script starts by setting its own process priority and diskIO priority to very low values, to ensure other running processes are as unaffected as possible.
Then it makes sure that it is the ONLY such process running.
The core of the script is really up to your preference. You can use rm -r
if you are sure that the whole dir can be deleted indesciminately (option 2), or you can use find
for more specific file deletion (option 1, possibly using command line options "$2" and onw. for convenience).
In the implementation above, Option 1 (find
) first outputs everything to a tempfile, so that the rm
function is only called once instead of after each file found by find
. When the number of files is indeed huge, this can amount to a significant time saving. On the downside, the size of the tempfile may become an issue, but this is only likely if you're deleting literally billions of files, plus, because the diskIO has such low priority, using a tempfile followed by a single rm
may in total be slower than using the find (...) -exec rm {} \;
option. As always, you should experiment a bit to see what best fits your needs.
EDIT: As suggested by user946850, you can also skip the whole tempfile and use
find (...) -print0 | xargs -0 rm
. This has a larger memory footprint, since all full paths to all matching files will be inserted in RAM until thefind
command is completely finished. On the upside: there is no additional file IO due to writes to the tempfile. Which one to choose depends on your use-case.
Upvotes: 6
Reputation: 25444
The -r
(recursive) switch removes everything below a directory, too -- including subdirectories. (Your command does not remove the directories, only the files.)
You can also speed up the find
approach:
find -type f -print0 | xargs -0 rm
Upvotes: 1