patterned
patterned

Reputation: 444

Jsoup returned string " " is not returning true on equals(" ")

Just playing around and pulling some data off a site to manipulate when I come across this:

String request = "http://foo";
String data = "bar";

Connection.Response res = Jsoup.connect(request).data(data).method(Method.POST).execute();
Document doc = res.parse();
Elements all = doc.select("td");

for(Element elem : all){
    String test = elem.text();
    if(test.equals(" ")){
       //redefine  test to 0 and print it
    }
    else{
       //print it
}

The site in question is coded as so:

<td align="center">Henry</td>
<td>23</td>
<td align="center">Savannah</td>
<td>15</td></tr>
...
<td align="center"> </td>
<td> </td>
<td align="center">Jane</td>
<td>15</td></tr>

In my for loop, test is never redefined.

I've debugged in Eclipse and String test is showing as so:

eclipse debug


Edit

Debugging test chartAt(0):

chartat0


org.jsoup.nodes.Element.text() says "Returns unencoded text or empty string if none". I'm assuming the unencoded part has something to do with this, but I can't figure it out.

I ran a test program:

public static void main(String[] args) {
    String str = " ";
    if (str.equals(" ")){
        System.out.println("True");
    }
}

and it returns true.

What gives?

Upvotes: 2

Views: 241

Answers (1)

Sotirios Delimanolis
Sotirios Delimanolis

Reputation: 280112

I don't know if you control the HTML being sent in the body of the response or if that is what you see in a browser's source page or elsewhere

<td> </td>

But it's possible the actual content is

<td>&nbsp</td> // or &#160

where &nbsp is the HTML entity for the non-breaking space.

In java, you can represent it as

char nbsp = 160;

So you could just check for both char values, the one for space and the one for non-breaking space.

Note that there might be other codepoints that are represented as white space. You need to know what you're looking for.

Upvotes: 3

Related Questions