Jungle Jim
Jungle Jim

Reputation: 333

Parsing string retrieved with Jsoup in Android

I am writing an Android App that will read some info from a website and display it on the App's screen. I am using the Jsoup library to get the info in the form of a string. First, here's what the website html looks like:

<strong>
   Now is the time<br />
   For all good men<br />
   To come to the aid<br />
   Of their country<br />
</strong>

Here's how I'm retrieving and trying to parse the text:

Document document = Jsoup.connect(WEBSITE_URL).get();
resultAggregator = "";

Elements nodePhysDon = document.select("strong");

//check results
if (nodePhysDon.size()> 0) {
   //get value
   donateResult = nodePhysDon.get(0).text();
   resultAggregator = donateResult;
}

if (resultAggregator != "") {
   // split resultAggregator into an array breaking up with br /
   String donateItems[] = resultAggregator.split("<br />");
}

But then donateItems[0] is not just "Now is the time", It's all four strings put together. I have also tried without the space between "br" and "/", and get the same result. If I do resultAggregator.split("br"); then donateItems[0] is just the first word: "Now".

I suspect the problem is the Jsoup method select is stripping the tags out?

Any suggestions? I can't change the website's html. I have to work with it as is.

Upvotes: 1

Views: 99

Answers (1)

Joel Min
Joel Min

Reputation: 3457

Try this:

//check results
if (nodePhysDon.size()> 0) {
   //use toString() to get the selected block with tags included
   donateResult = nodePhysDon.get(0).toString();
   resultAggregator = donateResult;
}

if (resultAggregator != "") {
// remove <strong> and </strong> tags
   resultAggregator = resultAggregator.replace("<strong>", "");
   resultAggregator = resultAggregator.replace("</strong>", "");
   //then split with <br>
   String donateItems[] = resultAggregator.split("<br>");
}

Make sure to split with <br> and not <br />

Upvotes: 1

Related Questions