Reputation: 115
I am being able to read my dataset( csv file ) but when I run my main class, It display all the rows including the rows with null values. Is there any way to ignore every row in the dataset with a missing value (ie. a null value)? I was thinking of checking that in a method testNullValue() but I don't really know what to check.
My Class
public static BufferedReader exTractTraningData(File datafile, String ListOfCharacteristics) throws IOException {
try {
//create BufferedReader to read csv file
BufferedReader reader = new BufferedReader(new FileReader(datafile));
String strLine = "";
StringTokenizer st = null;
int lineNumber = 0, tokenNumber = 0;;
while ((strLine = reader.readLine()) != null) {
lineNumber++;
//break comma separated line using ","
st = new StringTokenizer(strLine, ",");
while (st.hasMoreTokens()) {
//display csv values
tokenNumber++;
System.out.println("Line # " + lineNumber
+ ", Token : " + st.nextToken(",") );
}
//reset token number
tokenNumber = 0;;
}
} catch (Exception e) {
System.out.println("Exception while reading csv file: " + e);
}
return null;
}
public boolean testNullValue(String ListOfCharacteristics, String ListOfValues){
return false;
}
And Lastly, I don't why the results in my console is not displaying each rows like this "name", "2 ", "TV ", "As ", " 40", "10" for example while I specified it in here st = new StringTokenizer(strLine, ",");
Upvotes: 1
Views: 5866
Reputation: 9192
StringTokenizer ignores null values when encountered and really gives no way of knowing they actually exist within a CSV delimited string line other than having the tokenizer also provide the delimiter as a token and when there are two delimiter tokens, one after the other, then a null value was obviously encountered:
st = new StringTokenizer(strLine, ",", true);
This is a real booger way of detecting null in a CSV file data line since now you would have to supply code to count when two delimiter tokens fall one after the other and then ignore delimiter tokens altogether. This is most likely one of the reasons why not too many people use StringTokenizer for parsing CSV files and prefer to use something like the String#split() method instead or better yet a CSV Parser API like OpenCSV. This of course depends upon what really needs to be done and how extensive it will be.
Use of the old legacy StringTokenizer Class in new code is actually discouraged since its methods do not distinguish among identifiers, numbers, and quoted strings. The class methods don't even recognize and skip comments.
In any case, if you want check for any null values within any single CSV line, you don't need to re-read the file. It can be done on the same single pass read you are currently doing. The concept is quite simple, utilize a code mechanism the takes any read in CSV file data line, split it into Tokens which also maintains the null values that might be contained in any given line, and then compare that token count to the very same data file line that was parsed with the StringTokenizer count. This sort of thing can be done directly after the CSV data line has been tokenized, for example:
while ((strLine = reader.readLine()) != null) {
// You might want to count lines only if they are valid!
// If so then move this line below the IF statement code
// block.
lineNumber++;
//break comma separated line using ","
st = new StringTokenizer(strLine, ",");
// Is this a blank line OR Is there possibly a null token
// in the data line detected by the String#split() method?
if (st.countTokens() == 0 || (st.countTokens() != strLine.split(",").length)) {
System.out.println("The data line is blank OR there is a null value "
+ "in the data line!");
// Skip this data line from further processing
// within the WHILE loop.
continue;
}
while (st.hasMoreTokens()) {
//display csv values
tokenNumber++;
System.out.println("Line # " + lineNumber
+ ", Token : " + st.nextToken(",") );
}
//reset token number
tokenNumber = 0;
}
I would personally just make use of the String#split() method and not bother using the StringTokenizer class altogether, perhaps something like this for example:
while ((strLine = reader.readLine()) != null) {
// You might want to count lines only if they are valid!
// If so then move this line below the IF statement code
// block.
lineNumber++;
// Split comma separated line using ","
String[] st = strLine.split(",");
if (st.length == 0 || Arrays.asList(st).contains("")) {
System.out.println("The data line (" + lineNumber + ") is blank OR "
+ "there is a null value in the data line!");
// Skip this data line from further processing
// within the WHILE loop.
continue;
}
StringBuilder sb = new StringBuilder();
sb.append("Line# ").append(lineNumber).append(": ");
for (int i = 0; i < st.length; i++) {
sb.append("Token : ").append(st[i]).
// Ternary Operator used here to add commas
append(i < (st.length-1) ? ", " : "");
}
System.out.println(sb.toString());
}
Of course this all assumes that the CSV file data is comma delimited with no whitespace before or after any delimiter. This is the problem when people post questions about data file handling and provide no example how the data is formatted within that file. This of course now takes me to your second problem as to why things don't display the way you intend it to:
And Lastly, I don't why the results in my console is not displaying each rows like this "name", "2 ", "TV ", "As ", " 40", "10"
Who knows without an example of how the data is presented in file and exactly how you want it presented on screen. What is the example suppose to be, I personally don't understand it. Besides, shouldn't it be "name", "gender", "2 " ... ?
We can of course guess and my guess would be that your delimiter used within the StringTokenizer methods is wrong and of course, all examples above are based on the delimiter you provided within your own code.
Upvotes: 1