Reputation: 2472
I have a Big html in String variable and I want to get contents of a div. I can not rely on regular expression because it can have nested div's. So, let's suppose I have following String -
String test = "<div><div id=\"mainContent\">foo bar<div>good best better</div> <div>test test</div></div><div>foo bar</div></div>";
Then how can I get this with a simple java program -
<div id="mainContent">foo bar<div>good best better</div> <div>test test</div></div>
Well my approch is something like this (might be horrable, still fighting to correct) -
public static void main(String[] args) {
int count = 1;
int fl = 0;
String s = "<div><div id=\"mainContent\">foo bar<div>good best better</div> <div>test test</div></div><div>foo bar</div></div>";
String tmp = s;
int len = s.length();
for (int i=0; i<len; i++){
int st = s.indexOf("div>");
if(st > -1) {
char c = s.charAt(st-1);
if(c == '/') {
count--;
} else {
count++;
}
s = s.substring(st+4);
System.out.println(s);
i = i + st;
System.out.println(c + " -- " + st + " -- " + count + " -- " + i);
if (count == 0) {
fl = i;
break;
}
}
}
System.out.println("final ind - " + fl);
s = tmp.substring(0, fl + 4);
System.out.println("final String - " + s);
}
Upvotes: 0
Views: 206
Reputation:
I would recommend using JSoup to parse the HTML and find what you are looking for.
It fulfills the simple requirement for sure. You can do what you want in just a couple of lines of code!
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.
scrape and parse HTML from a URL, file, or string
find and extract data, using DOM traversal or CSS selectors
jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree.
Using the selector syntax makes finding and extracting data extremely simple.
public static void main(final String[] args)
{
final String s = "<div><div id=\"mainContent\">foo bar<div>good best better</div> <div>test test</div></div><div>foo bar</div></div>";
final Document d = Jsoup.parse(s);
final Elements e = d.select("#mainContent");
System.out.println(e.get(0));
}
outputs
<div id="mainContent">
foo bar
<div>
good best better
</div>
<div>
test test
</div>
</div>
Doesn't get much more simple than that!
Upvotes: 2
Reputation: 17535
I'm afraid the answer is: You don't. At least not with a "simple" program...
But there is hope: You can use a HTML parser library (like NekoHTML or HTMLParser, although the latter project seems to be dead) to parse the string and retrive the part you need.
Upvotes: 0