String
String

Reputation: 3728

Html Text Extraction in j2me

I've a String from html web page like this:

String htmlString =

<span style="mso-bidi-font-family:Gautami;mso-bidi-theme-font:minor-bidi">President Pranab pay great 
tributes to Motilal Nehru on occasion of 
</span>
150th birth anniversary. Pranab said institutions evolved by 
leaders like him should be strengthened instead of being destroyed. 
<span style="mso-spacerun:yes">&nbsp;
</span>
He listed his achievements like his role in evolving of Public Accounts Committee and protecting independence of 
Legislature from the influence of the Executive by establishing a separate cadre for the Central Legislative Assembly,   
the first set of coins and postal stamps released at the function to commemorate the event.
</p> 

i need to extract the text from above String ,after extraction my out put should look like

OutPut:

President Pranab pay great tributes to Motilal Nehru on occasion of 150th birth anniversary. Pranab said institutions evolved by leaders like him should be strengthened instead of being destroyed.  He listed his achievements like his role in evolving of Public Accounts Committee and protecting independence of Legislature from the influence of the Executive by establishing a separate cadre for the Central Legislative Assembly, now Parliament. Calling himself a student of history, he said Motilal's Swaraj Party acted as a disciplined assault force in the Legislative Assembly and he was credited with evolving the system of a Public Accounts Committee which is now one of the most effective watchdogs over executive in matters of money and finance. Mukherjee also received the first set of coins and postal stamps released at the function to commemorate the event.

For this i have used below logic:

int spanIndex = content.indexOf("<span");
spanIndex = content.indexOf(">", spanIndex);
int endspanndex = content.indexOf("</span>", spanIndex);
content = content.substring(spanIndex  + 1, endspanndex);

and my Resultant out put is:

President Pranab pay great tributes to Motilal Nehru on occasion of

I have used Different HTMLParsers,but those are not working in case of j2me

can any one help me to get full description text? thanks .....

Upvotes: 1

Views: 506

Answers (4)

String
String

Reputation: 3728

We can Extract the Text In Case of j2me as it is not suporting HTMLParsers,like this:

private String removeHtmlTags(String content) {

        while (content.indexOf("<") != -1) {

            int beginTag;
            int endTag;

            beginTag = content.indexOf("<");
            endTag = content.indexOf(">");
            if (beginTag == 0) {
                content = content.substring(endTag
                        + 1, content.length());
            } else {
                content = content.substring(0, beginTag) + content.substring(endTag
                        + 1, content.length());
            }
        }
        return content;
    }

Upvotes: 1

Richard
Richard

Reputation: 8920

If you are using BlackBerry OS 5.0 or later you can use the BrowserField to parse HTML into a DOM document.

Upvotes: 2

radkovo
radkovo

Reputation: 888

You may continue the same way as you propose with the rest of the string. Alternatively, a simple finite-state automaton would solve this. I have seen such solution in the moJab procect (you can download the sources here). In the mojab.xml package, there is a minimalistic XML parser designed for j2me. I mean it would parse your example as well. Take look at the sources, it's just three simple clases. It seems to be usable without modifications.

Upvotes: 1

Santosh
Santosh

Reputation: 17893

JSoup is a very popular library for extracting text from HTML documents. Here is one such example of the same.

Upvotes: 0

Related Questions