Reputation: 109
I am reading a .dat file line by line and I want to separate fields using the delimiter ("\t"), because every field is separated by tab.
However, there are some non-required fields and they can be blank, so if there are two consecutive tabs ("\t"), I want to detect the second one and store a blank String.
StringTokenizer stringTokenizer = new StringTokenizer(line, "\t");
ArrayList<String> al = new ArrayList<>();
while (stringTokenizer.hasMoreTokens()) {
al.add(stringTokenizer.nextToken());
}
System.out.println(al.size() + " >> " + al);
When I try the above and I have the following input lines:
R 900081458 22222-22-2 1 -1 1 0 0 1
R 245047685 7250-46-6 0 -1 0 0 0 0
R 245048731 13755-29-8 237-340-6 0 -1 0 0 0 0
R 245047201 1080-12-2 214-096-9 0 -1 0 0 0 0
R 1 118725-24-9 612-118-00-5 405-080-4 0 0 0 0 0 0
I can't handle the two consecutive tabs, so I have the following output:
9 >> [R, 900081458, 22222-22-2, 1, -1, 1, 0, 0, 1]
9 >> [R, 245047685, 7250-46-6, 0, -1, 0, 0, 0, 0]
10 >> [R, 245048731, 13755-29-8, 237-340-6, 0, -1, 0, 0, 0, 0]
10 >> [R, 245047201, 1080-12-2, 214-096-9, 0, -1, 0, 0, 0, 0]
11 >> [R, 1, 118725-24-9, 612-118-00-5, 405-080-4, 0, 0, 0, 0, 0, 0]
While the desired output would be something like this (in case I fill the two consecutive blanks with "BLANK"):
11 >> [R, 900081458, 22222-22-2, "BLANK", "BLANK", 1, -1, 1, 0, 0, 1]
11 >> [R, 245047685, 7250-46-6, "BLANK", "BLANK", 0, -1, 0, 0, 0, 0]
11 >> [R, 245048731, 13755-29-8, 237-340-6, "BLANK", 0, -1, 0, 0, 0, 0]
11 >> [R, 245047201, 1080-12-2, 214-096-9, "BLANK", 0, -1, 0, 0, 0, 0]
11 >> [R, 1, 118725-24-9, 612-118-00-5, 405-080-4, 0, 0, 0, 0, 0, 0]
Upvotes: 0
Views: 95
Reputation: 5375
StringTokenizer is not great with blanks, use String.split() instead. Try this:
String[] strings = line.split("\t");
ArrayList<String> al = new ArrayList<>();
for (String string : strings) {
al.add(string );
}
System.out.println(al.size() + " >> " + al);
As per k314159 - using opencsv is much smarter.
Upvotes: 1