Reputation: 1838

How to remove duplicates from a dataset in Java?

Let say a dataset is like this:

Sno  country  noOfDeaths
1    India    3245325
2    America  234523
3    UK       3432523
3    UK       3432523

Here Sno 3 is duplicated, I want to remove this entire row.

3    UK       3432523

This last line should remove.

Here is my code how I'm reading the dataset:

data_reader.java

public class data_reader {
    String filePath="src\\covid_19_data.csv";
    BufferedReader reader=null;
    String line="";
    
    public void readDataSet() {
        try {
            reader=new BufferedReader(new FileReader(filePath));
            while((line=reader.readLine())!=null) {
                String[] row=line.split(",");
                for(String index:row) {
                    System.out.printf("%-10s",index);
                }
                System.out.println();
            }
            
        }catch(Exception e){
            e.printStackTrace();
            
        }
        finally {
            try {
                reader.close();
                } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
                }
            
        
    }

        
    }
    
    
}

main.java

public class Main {
    public static void main(String[] args) {
        
        data_reader obj=new data_reader();
        obj.readDataSet();
}}

Please help how to do this.

Update:


public class data_reader {
    String filePath="src\\abc.csv";
    BufferedReader reader=null;
    String line="";
    String duplicateLine="";
    Set<String> idSet = new HashSet<String>();
    public void readDataSet() {
        try {
            
            reader=new BufferedReader(new FileReader(filePath));            
            while((line=reader.readLine())!=null) {
                String[] row=line.split(",");
                idSet.add(row[0]);
                

//              for(String index:row) {
//              System.out.printf("%-10s",index);
//              }
                System.out.println();
                
                
            }
System.out.print(idSet); 
            
        }catch(Exception e){
            e.printStackTrace();
            
        }
        finally {
            try {
                reader.close();
                } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
                }
            
        
    }}

output:


[1, 2, 3, Sno]  it delete the last line which was duplicated

but how to print the output like this?

Sno  country  noOfDeaths
1    India    3245325
2    America  234523
3    UK       3432523

Upvotes: 0

Answers (3)

TimeToCode

Reputation: 1838

Here u go !

public class data_reader {
    String filePath="src\\abc.csv";
 BufferedReader br = null;
 HashSet<String> lines = new HashSet<>();
 String line = "";
    public void readDataSet() {
            try {
                br = new BufferedReader(new FileReader(filePath));
                while ((line = br.readLine()) != null) {
                    if (lines.add(line)) {
                        System.out.println(line);
                    }
                }

            } catch (FileNotFoundException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                if (br != null) {
                    try {
                        br.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            }
    }
    
        
    }

output:

Sno,Country,noofDeaths
1,d,32432
2,f,32432
3,f,3332

Upvotes: 0

Basil Bourque

Reputation: 339858

Record

Define a record to hold your data.

record Sample ( int sno, String country, long noOfDeaths ) {}

As a record, the compiler implicitly creates overrides of the equals and hashCode methods. Those methods’ implementations consider each and every member field.

Set

Instantiate an object per row of input data. Collect into a Set. Sets disallow duplicate. Any duplicate being added is ignored.

Set< Sample > samples = new HashSet<>();
…
samples.add( new Sample( … ) ) ;

`NavigableSet`

You may want to sort your distinct objects. Use a NavigableSet such as TreeSet, passing a Comparator object to specify the desired ordering.

The getter methods in a record share the same name as their respective member field. The getters in a record do not follow the JavaBeans’ naming convention of get/is prefix.

NavigableSet < Sample > samples = 
        new TreeSet<>( 
            Comparator.comparingLong( Sample :: noOfDeaths ) 
        );
…
samples.add( new Sample( … ) ) ;

Upvotes: 2

Nico S.

Reputation: 151

Create a Set<Integer> idSet where you put the Ids in. If the idSet.add() method returns false delete the current line.

Adds the specified element to this set if it is not already present (optional operation). More formally, adds the specified element e to this set if the set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false. In combination with the restriction on constructors, this ensures that sets never contain duplicate elements.

Upvotes: 0

How to remove duplicates from a dataset in Java?

Answers (3)

Record

Set

NavigableSet

Related Questions

`NavigableSet`