How do I list out duplicate entries in a hash?

Question

I'm attempting to eliminate duplicate files from a filesystem with around 12,000 decent-size (150+ MB) files. I expect 20-50 duplicates in the set.

Rather than do a checksum on every single file, which is relatively demanding, my idea was to build a hash listing every file and its filesize, eliminate entries where the filesize is unique, and only do a checksum on the remainders, saving a lot of time.

However I'm having a bit of trouble stripping the hash down to just the unique entries. I tried, where files is a hash like super_cool_map.png => 1073741824,:

uniques = files.values.uniq
dupes = files.delete_if do |k,v|
  uniques.include?(v)
end
puts dupes

But that only outputs a blank hash. What should I do?

Arup Rakshit · Accepted Answer

How is this ?

# this will give the grouped same size files as an array.
files.group_by(&:last).map { |_, v| v.map(&:first) if v.size > 1 }.compact

How do I list out duplicate entries in a hash?

Answers (2)

Related Questions