Richard J.
Richard J.

Reputation: 21

Removing Duplicate Files (Linux)

I'm using fdupes currently to locate and remove duplicate files (ubuntu 20.10). Fast and handy, but it only takes one target filesystem tree to search.

I want to say, "Here (i.e. ./Pictures/2018) are my originals (the ones to keep), find all duplicates anywhere else in ./Pictures (and optionally delete them.)"

Do any tools exist that would let me do that, rather than going through the list of 2100 files and carefully removing all but the keepers (or writing a program myself)?

Upvotes: 2

Views: 744

Answers (2)

Piotr Kołaczkowski
Piotr Kołaczkowski

Reputation: 2629

Fclones can scan multiple directory trees. It is also many factors faster than fdupes.

If you need to remove duplicate files from one directory tree that match files in another directory, all you need is the --isolate option. Make sure to list your originals as the first argument.

# find the duplicates:
fclones group --isolate originals copies >dupes.txt  

# remove the duplicates:
fclones remove <dupes.txt  

Longer demo:

pkolaczk@p5520:~/Temp$ mkdir files
pkolaczk@p5520:~/Temp$ mkdir files/originals
pkolaczk@p5520:~/Temp$ echo foo1 >files/foo1.txt
pkolaczk@p5520:~/Temp$ echo foo2 >files/foo2.txt
pkolaczk@p5520:~/Temp$ echo foo1 >files/originals/foo1.txt
pkolaczk@p5520:~/Temp$ echo foo2 >files/originals/foo2.txt
pkolaczk@p5520:~/Temp$ fclones group --isolate files/originals files >dupes.txt
[2022-05-13 17:51:37.759] fclones:  info: Started grouping
[2022-05-13 17:51:37.761] fclones:  info: Scanned 9 file entries
[2022-05-13 17:51:37.761] fclones:  info: Found 6 (30 B) files matching selection criteria
[2022-05-13 17:51:37.761] fclones:  info: Found 2 (10 B) candidates after grouping by size
[2022-05-13 17:51:37.761] fclones:  info: Found 2 (10 B) candidates after grouping by paths and file identifiers
[2022-05-13 17:51:37.763] fclones:  info: Found 2 (10 B) candidates after grouping by prefix
[2022-05-13 17:51:37.763] fclones:  info: Found 2 (10 B) candidates after grouping by suffix
[2022-05-13 17:51:37.763] fclones:  info: Found 2 (10 B) redundant files
pkolaczk@p5520:~/Temp$ fclones remove --dry-run <dupes.txt 
[2022-05-13 17:51:51.938] fclones:  info: Started deduplicating (dry run)
rm /home/pkolaczk/Temp/files/foo2.txt
rm /home/pkolaczk/Temp/files/foo1.txt
[2022-05-13 17:51:51.939] fclones:  info: Would process 2 files and reclaim 10 B space
pkolaczk@p5520:~/Temp$ fclones remove <dupes.txt 
[2022-05-13 17:57:08.220] fclones:  info: Started deduplicating
[2022-05-13 17:57:08.226] fclones:  info: Processed 2 files and reclaimed 10 B space
pkolaczk@p5520:~/Temp$ ls files
originals
pkolaczk@p5520:~/Temp$ ls files/originals
foo1.txt  foo2.txt

More information: https://github.com/pkolaczk/fclones

Upvotes: 1

user13726895
user13726895

Reputation: 35

The best choise is rdfind.

See at:

https://rdfind.pauldreik.se/

Upvotes: 2

Related Questions