Kiran Bhat
Kiran Bhat

Reputation: 3847

Identifying duplicate numbers in a text file using Hash Set

Here I wrote code which shows repeated numbers in a text file. Here I assumed that text file contains only integers in each line. As you can see now it's showing repeated integers in the text file.

I hard coded the path name of the text file.

Here I used two Hash Set to implement it. Can I do it using only one Hash Set?? Can you tell me how to implement the same using only one Hash Set.?

import java.io.*;
import java.util.*;

public class FileRead {

/**
 * @param args
 */
public static void main(String[] args) {
    // TODO Auto-generated method stub
    HashSet <String> uniquelines=new HashSet<String>();
    HashSet<String>duplicatelines=new HashSet<String>();


    try{
        FileInputStream fstream=new FileInputStream("C:/Users/LENOVO/Desktop/txt.txt");
        DataInputStream in=new DataInputStream(fstream);
        BufferedReader br=new BufferedReader(new InputStreamReader(in));
        ArrayList arr=new ArrayList();
        String str;
        while((str=br.readLine())!=null){
            if(uniquelines.contains(str)){
                if(!duplicatelines.contains(str)){
                    duplicatelines.add(str);
                    System.out.println(str);
                }
            }
            else{
                uniquelines.add(str);
            }
        }
        in.close();
    }catch(Exception e){
        System.out.println(e);
    }

}

}

Upvotes: 2

Views: 1609

Answers (2)

Renato
Renato

Reputation: 13700

You don't need to check if uniquelines already contains the string, just add it anyway... the hashset itself will do the check and will not allow duplicates. See the code below...

If you don't care about printing the duplicates several times, not only once (maybe you're printing it just for testing?), you don't need the Set duplicates in the code below.... but if you do, then there's no way you can do it without keeping track of what duplicates you've found before, so yes, you would need the two sets...

    public static void main(String[] args) {
    HashSet <String> uniquelines=new HashSet<String>();
    Set <String> duplicates=new HashSet<String>();
    BufferedReader br = null;
    try{
        FileInputStream fstream=new FileInputStream("C:/Users/LENOVO/Desktop/txt.txt");
        DataInputStream in=new DataInputStream(fstream);
        br=new BufferedReader(new InputStreamReader(in));
        String str;
        while((str=br.readLine())!=null){
            boolean duplicate = !uniquelines.add(str);
            if (duplicate) {
                if (!duplicates.contains(str)) {
                    System.out.println(str);
                    duplicates.add(str);
                }
            }
        }
    } catch(Exception e) {
        System.out.println(e);
    } finally {
        try {
            br.close();
        } catch(Exception e2) { }
    }

}

Upvotes: 1

JB Nizet
JB Nizet

Reputation: 692023

To keep the existing functionality, I don't see how you could use a single HashSet. You could, however, use a single HashMap where the key would be the lines, and the value would be the number of occurrences of the line in the file.

Side notes:

  • streams, readers and writers should always be closed in a finally block.
  • your arr variable isn't useful.

Upvotes: 3

Related Questions