Reputation: 444
Just playing around and pulling some data off a site to manipulate when I come across this:
String request = "http://foo";
String data = "bar";
Connection.Response res = Jsoup.connect(request).data(data).method(Method.POST).execute();
Document doc = res.parse();
Elements all = doc.select("td");
for(Element elem : all){
String test = elem.text();
if(test.equals(" ")){
//redefine test to 0 and print it
}
else{
//print it
}
The site in question is coded as so:
<td align="center">Henry</td>
<td>23</td>
<td align="center">Savannah</td>
<td>15</td></tr>
...
<td align="center"> </td>
<td> </td>
<td align="center">Jane</td>
<td>15</td></tr>
In my for loop, test
is never redefined.
I've debugged in Eclipse and String test
is showing as so:
Edit
Debugging test chartAt(0):
org.jsoup.nodes.Element.text()
says "Returns unencoded text or empty string if none". I'm assuming the unencoded part has something to do with this, but I can't figure it out.
I ran a test program:
public static void main(String[] args) {
String str = " ";
if (str.equals(" ")){
System.out.println("True");
}
}
and it returns true.
What gives?
Upvotes: 2
Views: 241
Reputation: 280112
I don't know if you control the HTML being sent in the body of the response or if that is what you see in a browser's source page or elsewhere
<td> </td>
But it's possible the actual content is
<td> </td> // or  
where  
is the HTML entity for the non-breaking space.
In java, you can represent it as
char nbsp = 160;
So you could just check for both char
values, the one for space and the one for non-breaking space.
Note that there might be other codepoints that are represented as white space. You need to know what you're looking for.
Upvotes: 3