Asmat Ali
Asmat Ali

Reputation: 335

How to remove duplicate files with same name but different extension?

I have a large number of images in a directory. The problem with some of the images is that they have duplicates with the same name but different extension, e.g. image1.jpg, image1.jpeg, image1.png, which are all the same images, same name but different extensions. How can I find and remove these duplicates using Java? There are a lot of tools for finding duplicates but I cant find any tool or script for this specific problem. Any help would be greatly appreciated.

Upvotes: 1

Views: 2636

Answers (3)

Yahya
Yahya

Reputation: 14072

Here is MCVE:

This example implements a Set to remove duplicate images automatically by only providing the path of the folder/directory that contains the images (just a different idea to show the other available options and how to avail of OO Features in Java)

import java.io.File;
import java.util.HashSet;
import java.util.Set;

public class DuplicateRemover {

    // inner class to represent an image
    class Image{
        String path; // the absolute path of image file as a String

        // constructor
        public Image(String path) {
            this.path = path;
        }       

        @Override
        public boolean equals(Object o) {
            if(o instanceof Image){
                // if both base names are equal -> delete the old one
                if(getBaseName(this.path).equals(getBaseName(((Image)o).path))){
                    File file = new File(this.path);
                    return file.delete();
                }
            }
            return false;
        }

        @Override
        public int hashCode() {
            return 0; // in this case, only "equals()" method is considered for duplicate check
         } 

         /**
          * This method to get the Base name of the image from the path
          * @param fileName
          * @return
          */
        private String getBaseName(String fileName) {
            int index = fileName.lastIndexOf('.'); 
            if (index == -1) { return fileName; } 
            else { return fileName.substring(0, index); }
         }
    }


    Set<Image> images; // a set of image files

    //constructor
    public DuplicateRemover(){
        images = new HashSet<>();
    } 

    /**
     * Get the all the images from the given folder
     * and loop through all files to add them to the images set
     * @param dirPath
     */
    public void run(String dirPath){
        File dir = new File(dirPath);
        File[] listOfImages = dir.listFiles(); 
        for (File f : listOfImages){
            if (f.isFile()) { 
                images.add(new Image(f.getAbsolutePath()));
            }
        }
    }


    //TEST
    public static void main(String[] args) {
        String dirPath = "C:\\Users\\Yahya Almardeny\\Desktop\\folder";
        /* dir contains: {image1.png, image1.jpeg, image1.jpg, image2.png}       */
        DuplicateRemover dr = new DuplicateRemover();
        // the images set will delete any duplicate image from the folder
        // according to the logic we provided in the "equals()" method
        dr.run(dirPath); 

        // print what images left in the folder
        for(Image image : dr.images) {
            System.out.println(image.path);
        }

        //Note that you can use the set for further manipulation if you have in later
    }

}

Result

C:\Users\Yahya Almardeny\Desktop\folder\image1.jpeg
C:\Users\Yahya Almardeny\Desktop\folder\image2.png

Upvotes: 1

Leviand
Leviand

Reputation: 2805

The only way to achieve this, imho, is creating an helper class:

    public class FileUtil {
    String fileName;
    File file;
    boolean delete = true;


    public FileUtil(String fileName, File file) {
        super();
        this.fileName = fileName.substring(0, fileName.indexOf("."));
        this.file = file;
    }

    public String getFileName() {
        return fileName;
    }
    public void setFileName(String fileName) {
        this.fileName = fileName;
    }
    public File getFile() {
        return file;
    }
    public void setFile(File file) {
        this.file = file;
    }
    public boolean isDelete() {
        return delete;
    }
    public void setDelete(boolean delete) {
        this.delete = delete;
    }

    @Override
    public String toString() {
        return "FileUtil [fileName=" + fileName + ", file=" + file + ", delete=" + delete + "]";
    }

}

then you can use this for collecting and deleting your items:

try (Stream<Path> paths = Files.walk(Paths.get("c:/yourPath/"))) {
        List<FileUtil> listUtil = new ArrayList<FileUtil>();

        paths
            .filter(Files::isRegularFile)
            .map(filePath -> filePath.toFile())
            .collect(Collectors.toList())
            .forEach(file -> listUtil.add(new FileUtil(file.getName(), file)));

        Map<String, List<FileUtil>> collect = listUtil.stream()
                .collect(Collectors.groupingBy(FileUtil::getFileName));

        for(String key : collect.keySet() ) {
            List<FileUtil> list = collect.get(key);
            if(list.size() > 1) {
                list.stream().findFirst().ifPresent(f -> f.setDelete(false));

                list.stream()
                    .filter(fileUtil -> fileUtil.isDelete())
                    .forEach(fileUtil -> fileUtil.getFile().delete());
            }
        }


    } catch (IOException e) {
        e.printStackTrace();
    } 

In this way I'm keeping a random item, if you prefer you can modify the class for keeping only the extension that you want, for example .png

I hope this helps :)

Upvotes: 1

achAmh&#225;in
achAmh&#225;in

Reputation: 4266

Read in all your files into a List of some sort:

List<File> filesInFolder = Files.walk(Paths.get("\\path\\to\\folder"))
        .filter(Files::isRegularFile)
        .map(Path::toFile)
        .collect(Collectors.toList());

Then just loop through them and delete if the file doesn't end with the extension you want:

filesInFolder.stream().filter((file) -> (!file.toString().endsWith(".jpg"))).forEach((file) -> {
    file.delete();
});

You can tailor this to your specific need.

Upvotes: 1

Related Questions