ThumbsDP
ThumbsDP

Reputation: 543

JSOUP: Parsing Javascript fields from an HTML document?

I'm fairly new to JSOUP, and i've had no issues parsing using Element.select on tags or id values. The issue i'm having is how to screen scrape javascript code in the page. Here i load the document:

Document doc = Jsoup.connect(pageUrl)
                .userAgent(Agent)
                .timeout(5000)
                .get();

The javascript field values i'm trying to extract are the following:

arrayGPSLocation["0"]    = "-19473982376,6848295867";
arrayGPSLocation["1"]    = "-19473982376,6848296245";

Since these array values are not in a standard code tag <> is JSOUP the appropriate way to do this? I like JSOUP's API. The only other method is hacking together a String routine... ie:

int start = pageBuffer.indexOf("arrayGPSLocation[\" + counter + \"]");
int end = pageBuffer.indexOf(";");
String result = pageBuffer.subString(start,end);

This pseudo-code example would have a serious performance problem when parsing a large page. Does anyone know how to accomplish this with JSOUP or should i write my own scraper?

Upvotes: 2

Views: 2495

Answers (1)

vacuum
vacuum

Reputation: 2273

All you can do with Jsoup - is select Element that contains javascript code, get its value as String and work with this string. Right like you doing it in example.

Upvotes: 2

Related Questions