TimeToCode
TimeToCode

Reputation: 1838

How to remove row which contains blank cell from csv file in Java

I'm trying to do data cleaning on dataset. by data cleaning i meant removing the row which containes NaN or duplicates values or empty cell. here is my code

dataset look like this:

Sno Country     noofDeaths
1                32432
2    Pakistan     NaN
3    USA          3332
3    USA          3332

excel file image: enter image description here

public class data_reader {
    String filePath="src\\abc.csv";
    public void readData() {
         BufferedReader br = null;
            String line = "";
          
            HashSet<String> lines = new HashSet<>();
            try {
                br = new BufferedReader(new FileReader(filePath));
                while ((line = br.readLine()) != null) {
                    if(!line.contains("NaN") || !line.contains("")) {
                        if (lines.add(line)) {
                            System.out.println(line);
                        }   
                    }
                }
            } catch (FileNotFoundException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                if (br != null) {
                    try {
                        br.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            }
    }   
    }
    
    

it is working fine for NaN values and duplicates rows but not for empty cell, please help how to do this.

!line.contains("")

this is not working.

Upvotes: 1

Views: 985

Answers (2)

hfontanez
hfontanez

Reputation: 6188

Seems to me this is a pretty easy problem to solve. Given a CSV file with an empty row

foo,bar,baz
1,One,123
,,
2,Two,456
3,Three,789

You can read the lines and define an empty line as one which contains empty strings separated by commas. You could read the contents of the file, store the populated lines into a string buffer, and then save the contents of the buffer once the empty lines are extracted out. The code below accomplishes this:

public static void main(String[] args) throws IOException {
     String file ="test.csv";
     BufferedReader reader = new BufferedReader(new FileReader(file));
     String line = null;
     StringBuilder sbuff = new StringBuilder();
     while ((line = reader.readLine()) != null) {
         String[] tokens = line.split(",");
         if (containsText(tokens)) {
             sbuff.append(line + "\n");
         }
     }
     reader.close();
     System.out.println(sbuff.toString());
     // save file here
}
    
public static boolean containsText(String[] tokens) {
    for (String token: tokens) {
        if (token.length() > 0)
            return true;
    }
    return false;
}

After running the code, the output is:

foo,bar,baz
1,One,123
2,Two,456
3,Three,789

This same code can be used to determine if a cell is empty with a simple method:

public static boolean isCellEmpty(String[] tokens) {
    for (String token: tokens) {
        if (token.isBlank())
            return true;
    }
    return false;
}

Upvotes: 0

Alexander Ivanchenko
Alexander Ivanchenko

Reputation: 29068

Condition !line.contains("") - doesn't make sence because every string contains empty string.

General suggestions:

  • don't hard code file-path, code must be reusable;
  • use try with resources;
  • camel-case names.
public class DataReader {
    public static void main(String[] args) {
        new DataReader().readData("src\\abc.csv");
    }

    public void readData(String filePath) {
        try(BufferedReader br = new BufferedReader(new FileReader(filePath))) {
            HashSet<String> lines = new HashSet<>();
            String line = null;
            while ((line = br.readLine()) != null) {
                if(!line.contains("NaN")) {
                    for (String cell: line.split(",")) {
                        if (!cell.isBlank()&&lines.add(cell)) {
                            System.out.print(cell + " ");
                        }
                    }
                }
                System.out.println();
            }
        }  catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Upvotes: 1

Related Questions