Reputation: 8358
I'm writing a utility class that should be able to parse delimited files (comma, pipe, colon delimited). It reads a particular line and needs to extract the most commonly used delimiter in that line. But this doesn't seem to be working as expected. When I call the getHeader()
method in my main method caller class, It seems like the only delimiter that's being acknowledged is a comma from the line. I think its my weak java and oop lacking skills preventing me from understanding the issue. Please advice. Thank you in advance. Below is my code:
public class Parser {
// sample line of data
String line = "There|is|data,in,this:file|hause";
private static class Delimiters {
static char DEFAULT = ',';
static char COMMA = ',';
static char SEMI = ';';
static char PIPE = '|';
static char COLON = ':';
};
public String[] getHeader() {
char delim = findDelimiter(line);
System.out.println("Header delim: " + delim);
String[] columns = line.split(String.valueOf(delim));
return columns;
}
// figure out the delimiter of the file. This method
// gets called on lines of file data
public char findDelimiter(String line) {
Delimiter dim = new Delimiter();
for (int i = 0; i < line.length(); i++) {
for (char delim : Arrays.asList(Delimiters.COLON, Delimiters.COMMA,
Delimiters.PIPE, Delimiters.SEMI)) {
if (delim == line.charAt(i)) {
dim.increaseDelimiterCnt(delim);
}
}
}
final char theLinesDelimiter = dim.mostCommonDelimiter();
return theLinesDelimiter;
}
private class Delimiter {
Map<Character, Integer> delimiterCounts = new HashMap<Character, Integer>();
private void increaseDelimiterCnt(char delim) {
System.out.println(delim);
int value = (delimiterCounts.containsKey(delim) ? delimiterCounts
.get(delim) : 0);
delimiterCounts.put(delim, value++);
System.out.println(getDelimiterCounts());
}
private Map<Character, Integer> getDelimiterCounts() {
return delimiterCounts;
}
/**
* Gets the delimiter based on greatest frequency of first line in file.
*
* @return String
*/
private char mostCommonDelimiter() {
char theDelimiter = ',';
System.out.println(delimiterCounts);
int maxValueInMap = (Collections.max(delimiterCounts.values()));
for (Map.Entry<Character, Integer> entry : delimiterCounts
.entrySet()) {
if (entry.getValue().equals(maxValueInMap)) {
theDelimiter = entry.getKey();
}
}
return theDelimiter;
}
}
}
Upvotes: 0
Views: 149
Reputation: 13427
You can just use split
to count the number of strings you get back. Just subtract 1 because the number of dividers is always 1 less then the number of parts.
public class Parser {
static String line = "There|is|data,in,this:file|hause";
private static final String[] DELIMS = {",", ";", "\\|", ":"};
private static Map<String, Integer> count = new HashMap<>();
public static void main(String[] args) {
for (String delim : DELIMS)
count.put(DELIMS[i], line.split(DELIMS[i]).length - 1);
}
}
Upvotes: 0
Reputation: 183514
The bug is in this line:
delimiterCounts.put(delim, value++);
++
after a variable-name is a post-increment, so although value++
increments value
, it still evaluates to the original value. So, the above is equivalent to this:
delimiterCounts.put(delim, value);
value = value + 1; // pointless, since we never use value again
Instead you should write:
delimiterCounts.put(delim, value + 1);
Your debugging output actually should have been enough to tell you this; it shows you that every delimiter in delimiterCounts
gets mapped to zero.
I suggest you read Eric Lippert's blog post, "How to debug small programs". You will probably find it useful for the future.
(Note: after fixing the above issue, you'll have another problem. Your program will now correctly identify |
as the delimiter, but line.split("|")
does not mean what you want. To fix that one, you'll want to use Pattern.quote
. I'll let you figure out the details.)
Edited to add: Since you have concerns about your OOP, I should also say something about the overall design. You may be able to create a more robust and efficient design by using an enumeration:
public enum Delimiter {
COMMA(','), SEMI(';'), PIPE('|'), COLON(':');
public final char c;
public final Pattern pattern;
private DelimiterChar(final char c) {
this.c = c;
this.pattern = Pattern.compile(Pattern.quote(Character.toString(c)));
}
}
You can then use Delimiter.values()
when you need to enumerate over the possible delimiters, and can use an EnumSet<Delimiter, Integer>
to store the counts by delimiter.
Note that I've used the name Delimiter
for something very different than you have. Your Delimiter
class should probably be called DelimiterCounter
, since its instances count delimiters, rather than themselves delimiting anything.
Upvotes: 2
Reputation: 106490
Your findDelmiter
method is broken. And, it's all one character's fault.
String#split
parses its arguments as a regular expression. The pipe character has special meaning in regex; that is, it is used to denote a branching match.
If you want to use a literal pipe, you have to escape it. You can't do that in a single character literal (as escaping only has contextual meaning for a String
, not a char
.
Why not just use a regular expression for the split method entirely? Let's not worry about picking out which one we used there, when we could rebuild that if need be.
Here's a snippet.
String[] columns = line.split(",|;|\\||:");
That complicated-looking regex is actually using the pipe character to branch - it will split on either a comma, a semi-colon, a pipe (properly escaped), or a colon.
I admire the fact that you're using a helper class to determine the counts of your delimiters. But, you would want to be careful about how you insert into the map - be sure that the key exists first, and if it does, pull the current value out of it, and add one to it, then put it back.
Here's a snippet.
private static void placeIntoMap(final Map<Character, Integer> counts, final char c) {
if(counts.containsKey(c)) {
counts.put(c, counts.get(c) + 1);
} else {
counts.put(c, 1);
}
}
Upvotes: 1