Reputation: 14448
I have a large collection of files contained in directories for testing. I need to keep the directory structure for my application but want to thin out the files for faster testing. I want to limit the number of files a directory can have to 3. How can I do that in linux?
To clarify what I would like to accomplish, a solution in Python:
import sys, os
for root, dirs, files in os.walk(sys.argv[1]):
for index, file in enumerate(files):
if index > int(sys.argv[2]) - 1: os.remove(os.path.join(root, file))
Usage:
python thinout.py /path/to/thin\ out/ <maximum_number_of_files_per_directory>
Example:
python thinout.py testing\ data 3
I found a smiliar question about doing this for one directory, but not recursively.
Upvotes: 0
Views: 285
Reputation: 4038
This quite lengthy sequence will work with files containing spaces etc., and just leave the first three alphabetically sorted files in each subdir.
EDIT: applied mklement's improvement to cope with directories that need escaping.
find /var/testfiles/ -type d -print0 | while IFS= read -r -d '' subdir; \
do cd "$subdir"; find . -mindepth 1 -maxdepth 1 -type f -print0 | \
sort --zero-terminated | tr '\0' '\n' | tail -n+4 | tr '\n' '\0' | \
xargs --null --no-run-if-empty rm ; cd "$OLDPWD" ; done
Since my version of tail doesn't support a --zero
or --null
flag for line terminators, I had to work around that with tr
. Suggestions for improvements are welcome.
Upvotes: 0
Reputation: 111
I would do something like this in bash:
for dir in `find . -type d`; pushd $dir; rm `ls | awk 'NR>3'`; popd; done;
Or this version might be better:
for dir in `find . -type d`; pushd $dir; rm `find . -maxdepth 1 -type f | tail -n +3`; popd; done;
Of course - just randomly deleting all but the first 3 files in the directory is always a little risky. Buyer beware...
By the way, I did not test this myself. Just typed in what came to mind. You'll likely have to tweak it a little to get it to work right. Again, buyer beware.
Upvotes: 2