lucasdc
lucasdc

Reputation: 1042

Getting content from a website in Java

I wish to get all the content of this website http://globoesporte.globo.com/temporeal/futebol/20-10-2013/botafogo-vasco/

specially the elements located at the bottom right of the screen called 'estatisticas'

I've tried to download FireBug and get the HTML file using jsoup but it didn't work. Jsoup couldn't find just the content I wanted, which made me get a little bit annoyed. Idk which techniques/api's or whatever I'm supposed to use to get the whole data from the website and I appreciate if you guys help me.

Thanks in advance.

Upvotes: 0

Views: 317

Answers (3)

Judking
Judking

Reputation: 6371

if you intend to crawl a website, you can use HttpClient, which can provide almost all the HTTP protocol operation. Here's a code snippet which may suits what you want:

HttpClient httpclient = new DefaultHttpClient();
HttpGet httpget = new HttpGet("http://globoesporte.globo.com/temporeal/futebol/20-10-2013/botafogo-vasco/");
HttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
if (entity != null) {
    InputStream instream = entity.getContent();
    try {
        // do something useful
    } finally {
        instream.close();
    }
}

P.S. the maven for HttpClient:

<dependency>
    <groupId>commons-httpclient</groupId>
    <artifactId>commons-httpclient</artifactId>
    <version>3.1</version>
</dependency>

Hope it helps:)

Upvotes: 0

Simmant
Simmant

Reputation: 1513

for that you need to explore html parser like jsoup and HTML parser . If you want all the code including html tags and then you also try this code

URL url = new URL("http://www.example.com");
InputStream io = url.openStream();
BufferedReader br = new BufferedReader(new InputStreamReader(io));
String str ="";
while((str=br.readLine())!=null)
{
System.out.println(str);
}

Upvotes: 0

user1864610
user1864610

Reputation:

The 'estatisticas' are loaded after the page load by an AJAX call - you can't scrape them from the page because they're not there.

You can, however, get them in JSON format at this address: http://globoesporte.globo.com/temporeal/futebol/20-10-2013/botafogo-vasco/estatisticas.json

Upvotes: 2

Related Questions