Horse Voice
Horse Voice

Reputation: 8358

Java use of object to hold state does not seem to be working

I'm writing a utility class that should be able to parse delimited files (comma, pipe, colon delimited). It reads a particular line and needs to extract the most commonly used delimiter in that line. But this doesn't seem to be working as expected. When I call the getHeader() method in my main method caller class, It seems like the only delimiter that's being acknowledged is a comma from the line. I think its my weak java and oop lacking skills preventing me from understanding the issue. Please advice. Thank you in advance. Below is my code:

    public class Parser {
    // sample line of data
    String line = "There|is|data,in,this:file|hause";

    private static class Delimiters {
        static char DEFAULT = ',';
        static char COMMA = ',';
        static char SEMI = ';';
        static char PIPE = '|';
        static char COLON = ':';
    };

    public String[] getHeader() {
        char delim = findDelimiter(line);
        System.out.println("Header delim: " + delim);
        String[] columns = line.split(String.valueOf(delim));
        return columns;
    }

    // figure out the delimiter of the file. This method
    // gets called on lines of file data
    public char findDelimiter(String line) {
        Delimiter dim = new Delimiter();
        for (int i = 0; i < line.length(); i++) {
            for (char delim : Arrays.asList(Delimiters.COLON, Delimiters.COMMA,
                    Delimiters.PIPE, Delimiters.SEMI)) {
                if (delim == line.charAt(i)) {
                    dim.increaseDelimiterCnt(delim);
                }
            }
        }

        final char theLinesDelimiter = dim.mostCommonDelimiter();
        return theLinesDelimiter;
    }

    private class Delimiter {
        Map<Character, Integer> delimiterCounts = new HashMap<Character, Integer>();

        private void increaseDelimiterCnt(char delim) {
            System.out.println(delim);

            int value = (delimiterCounts.containsKey(delim) ? delimiterCounts
                    .get(delim) : 0);
            delimiterCounts.put(delim, value++);
            System.out.println(getDelimiterCounts());
        }

        private Map<Character, Integer> getDelimiterCounts() {
            return delimiterCounts;
        }

        /**
         * Gets the delimiter based on greatest frequency of first line in file.
         * 
         * @return String
         */
        private char mostCommonDelimiter() {
            char theDelimiter = ',';
            System.out.println(delimiterCounts);
            int maxValueInMap = (Collections.max(delimiterCounts.values()));
            for (Map.Entry<Character, Integer> entry : delimiterCounts
                    .entrySet()) {
                if (entry.getValue().equals(maxValueInMap)) {
                    theDelimiter = entry.getKey();
                }
            }
            return theDelimiter;
        }
    }

}

Upvotes: 0

Views: 149

Answers (3)

user1803551
user1803551

Reputation: 13427

You can just use split to count the number of strings you get back. Just subtract 1 because the number of dividers is always 1 less then the number of parts.

public class Parser {

    static String line = "There|is|data,in,this:file|hause";
    private static final String[] DELIMS = {",", ";", "\\|", ":"};
    private static Map<String, Integer> count = new HashMap<>();

    public static void main(String[] args) {

        for (String delim : DELIMS)
            count.put(DELIMS[i], line.split(DELIMS[i]).length - 1);
    }
}

Upvotes: 0

ruakh
ruakh

Reputation: 183514

The bug is in this line:

            delimiterCounts.put(delim, value++);

++ after a variable-name is a post-increment, so although value++ increments value, it still evaluates to the original value. So, the above is equivalent to this:

            delimiterCounts.put(delim, value);
            value = value + 1; // pointless, since we never use value again

Instead you should write:

            delimiterCounts.put(delim, value + 1);

Your debugging output actually should have been enough to tell you this; it shows you that every delimiter in delimiterCounts gets mapped to zero.

I suggest you read Eric Lippert's blog post, "How to debug small programs". You will probably find it useful for the future.

(Note: after fixing the above issue, you'll have another problem. Your program will now correctly identify | as the delimiter, but line.split("|") does not mean what you want. To fix that one, you'll want to use Pattern.quote. I'll let you figure out the details.)


Edited to add: Since you have concerns about your OOP, I should also say something about the overall design. You may be able to create a more robust and efficient design by using an enumeration:

public enum Delimiter {
    COMMA(','), SEMI(';'), PIPE('|'), COLON(':');

    public final char c;
    public final Pattern pattern;

    private DelimiterChar(final char c) {
        this.c = c;
        this.pattern = Pattern.compile(Pattern.quote(Character.toString(c)));
    }
}

You can then use Delimiter.values() when you need to enumerate over the possible delimiters, and can use an EnumSet<Delimiter, Integer> to store the counts by delimiter.

Note that I've used the name Delimiter for something very different than you have. Your Delimiter class should probably be called DelimiterCounter, since its instances count delimiters, rather than themselves delimiting anything.

Upvotes: 2

Makoto
Makoto

Reputation: 106490

Your findDelmiter method is broken. And, it's all one character's fault.

String#split parses its arguments as a regular expression. The pipe character has special meaning in regex; that is, it is used to denote a branching match.

If you want to use a literal pipe, you have to escape it. You can't do that in a single character literal (as escaping only has contextual meaning for a String, not a char.

Why not just use a regular expression for the split method entirely? Let's not worry about picking out which one we used there, when we could rebuild that if need be.

Here's a snippet.

String[] columns = line.split(",|;|\\||:");

That complicated-looking regex is actually using the pipe character to branch - it will split on either a comma, a semi-colon, a pipe (properly escaped), or a colon.

I admire the fact that you're using a helper class to determine the counts of your delimiters. But, you would want to be careful about how you insert into the map - be sure that the key exists first, and if it does, pull the current value out of it, and add one to it, then put it back.

Here's a snippet.

private static void placeIntoMap(final Map<Character, Integer> counts, final char c) {
    if(counts.containsKey(c)) {
        counts.put(c, counts.get(c) + 1);
    } else {
        counts.put(c, 1);
    }
}

Upvotes: 1

Related Questions