Reputation: 1838
I'm trying to do data cleaning on dataset. by data cleaning i meant removing the row which containes NaN
or duplicates values or empty cell. here is my code
dataset look like this:
Sno Country noofDeaths
1 32432
2 Pakistan NaN
3 USA 3332
3 USA 3332
public class data_reader {
String filePath="src\\abc.csv";
public void readData() {
BufferedReader br = null;
String line = "";
HashSet<String> lines = new HashSet<>();
try {
br = new BufferedReader(new FileReader(filePath));
while ((line = br.readLine()) != null) {
if(!line.contains("NaN") || !line.contains("")) {
if (lines.add(line)) {
System.out.println(line);
}
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
it is working fine for NaN values and duplicates rows but not for empty cell, please help how to do this.
!line.contains("")
this is not working.
Upvotes: 1
Views: 985
Reputation: 6188
Seems to me this is a pretty easy problem to solve. Given a CSV file with an empty row
foo,bar,baz
1,One,123
,,
2,Two,456
3,Three,789
You can read the lines and define an empty line as one which contains empty strings separated by commas. You could read the contents of the file, store the populated lines into a string buffer, and then save the contents of the buffer once the empty lines are extracted out. The code below accomplishes this:
public static void main(String[] args) throws IOException {
String file ="test.csv";
BufferedReader reader = new BufferedReader(new FileReader(file));
String line = null;
StringBuilder sbuff = new StringBuilder();
while ((line = reader.readLine()) != null) {
String[] tokens = line.split(",");
if (containsText(tokens)) {
sbuff.append(line + "\n");
}
}
reader.close();
System.out.println(sbuff.toString());
// save file here
}
public static boolean containsText(String[] tokens) {
for (String token: tokens) {
if (token.length() > 0)
return true;
}
return false;
}
After running the code, the output is:
foo,bar,baz
1,One,123
2,Two,456
3,Three,789
This same code can be used to determine if a cell is empty with a simple method:
public static boolean isCellEmpty(String[] tokens) {
for (String token: tokens) {
if (token.isBlank())
return true;
}
return false;
}
Upvotes: 0
Reputation: 29068
Condition !line.contains("") - doesn't make sence because every string contains empty string.
General suggestions:
public class DataReader {
public static void main(String[] args) {
new DataReader().readData("src\\abc.csv");
}
public void readData(String filePath) {
try(BufferedReader br = new BufferedReader(new FileReader(filePath))) {
HashSet<String> lines = new HashSet<>();
String line = null;
while ((line = br.readLine()) != null) {
if(!line.contains("NaN")) {
for (String cell: line.split(",")) {
if (!cell.isBlank()&&lines.add(cell)) {
System.out.print(cell + " ");
}
}
}
System.out.println();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Upvotes: 1