Reputation: 1042
I wish to get all the content of this website http://globoesporte.globo.com/temporeal/futebol/20-10-2013/botafogo-vasco/
specially the elements located at the bottom right of the screen called 'estatisticas'
I've tried to download FireBug and get the HTML file using jsoup but it didn't work. Jsoup couldn't find just the content I wanted, which made me get a little bit annoyed. Idk which techniques/api's or whatever I'm supposed to use to get the whole data from the website and I appreciate if you guys help me.
Thanks in advance.
Upvotes: 0
Views: 317
Reputation: 6371
if you intend to crawl a website, you can use HttpClient
, which can provide almost all the HTTP protocol operation. Here's a code snippet which may suits what you want:
HttpClient httpclient = new DefaultHttpClient();
HttpGet httpget = new HttpGet("http://globoesporte.globo.com/temporeal/futebol/20-10-2013/botafogo-vasco/");
HttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
if (entity != null) {
InputStream instream = entity.getContent();
try {
// do something useful
} finally {
instream.close();
}
}
P.S.
the maven for HttpClient
:
<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
<version>3.1</version>
</dependency>
Hope it helps:)
Upvotes: 0
Reputation: 1513
for that you need to explore html parser like jsoup and HTML parser . If you want all the code including html tags and then you also try this code
URL url = new URL("http://www.example.com");
InputStream io = url.openStream();
BufferedReader br = new BufferedReader(new InputStreamReader(io));
String str ="";
while((str=br.readLine())!=null)
{
System.out.println(str);
}
Upvotes: 0
Reputation:
The 'estatisticas' are loaded after the page load by an AJAX call - you can't scrape them from the page because they're not there.
You can, however, get them in JSON format at this address: http://globoesporte.globo.com/temporeal/futebol/20-10-2013/botafogo-vasco/estatisticas.json
Upvotes: 2