Chris Salij
Chris Salij

Reputation: 3126

Weird Java String comparison

I'm having a minor issue with Java String comparisons.

I've written a class which takes in a String and parses it into a custom tree type. I've written a toString class which then converts this tree back to a String again. As part of my unit tests I'm just checking that the String generated by the toString method is the same as the String that was parsed in the first place.

Here is my simple test with a few printouts so that we can see whats going on.

final String exp1 = "(a|b)";
final String exp2 = "((a|b)|c)";
final Node tree1 = Reader.parseExpression2(exp1);
final Node tree2 = Reader.parseExpression2(exp2);
final String t1 = tree1.toString();
final String t2 = tree2.toString();

System.out.println(":" + exp1 + ":" + t1 + ":");
System.out.println(":" + exp2 + ":" + t2 + ":");

System.out.println(exp1.compareToIgnoreCase(t1));
System.out.println(exp2.compareToIgnoreCase(t2));

System.out.println(exp1.equals(t1));
System.out.println(exp2.equals(t2));

Has the following output; (NB ":" - are used as delineators so I can ensure theres no extra whitespace)

:(a|b):(a|b):
:((a|b)|c):((a|b)|c):
-1
-1
false
false

Based on manually comparing the strings exp1 and exp2 to t1 and t2 respectively, they are exactly the same. But for some reason Java is insisting they are different.

This isn't the obvious mistake of using == instead of .equals() but I'm stumped as to why two seemingly identical strings are different. Any help would be much appreciated :)

Upvotes: 2

Views: 1059

Answers (3)

Akintayo Olusegun
Akintayo Olusegun

Reputation: 917

I have some suggestions

  • Copy each output and paste in Notepad (or any similar editor), then copy them again and do something like this

    System.out.println("(a|b)".compareToIgnoreCase("(a|b)"));

  • Print out the integer representation of each character. If it is a weird unicode, the int representation will be different.

  • Also what version of JDK are you using?

Upvotes: 1

Luke Woodward
Luke Woodward

Reputation: 64959

Does one of your strings have a null character within it? These might not be visible when you use System.out.println(...).

For example, consider this class:

public class StringComparison {
    public static void main(String[] args) {
        String s = "a|b";
        String t = "a|b\0";
        System.out.println(":" + s + ":" + t + ":");
        System.out.println(s.equals(t));
    }
}

When I ran this on Linux it gave me the following output:

:a|b:a|b:
false

(I also ran it on Windows, but the null character showed up as a space.)

Upvotes: 3

paxdiablo
paxdiablo

Reputation: 881423

Well, it certainly looks okay. What I would do would be to iterate over both strings using charAt to compare every single character with the equivalent in the other string. This will, at a minimum, hopefully tell you the offending character.

Also output everything else you can find out about both strings, such as the length.

It could be that one of the characters, while looking the same, may be some other Unicode doppelganger :-)

You may also want to capture that output and do a detailed binary dump on it, such as loading it up into gvim and using the hex conversion tool, or executing od -xcb (if available) on the captured output. There may be an obvious difference when you get down to the binary examination level.

Upvotes: 2

Related Questions