Muratcan
Muratcan

Reputation: 235

Screen Scraping using JSoup

I want to get data from this web site with web scraping. http://myservices.ect.nl/tracing/objectstatus/Pages/Overview.aspx:

enter image description here

I used JSoup before for more static HTML sites, but this one is more difficult because before I get the HTML table on the site have to click one button and I don't know if it's possible to use JSoup to manipulate the button.

After click this button I get a HTML table, I want to get data only where modality is Barge.

Thank you for your tip to use Firefox, now I have the table with some another page information. Can you tell me how can i get only table information? Output that I get is as follows:

enter image description here

Upvotes: 3

Views: 1973

Answers (3)

Makky
Makky

Reputation: 17461

You will have to use Selenium HTML Unit Driver for that.

Selenium Info

Maven/Download Binary JAR

HTML Unit Driver

Here is full working example. It will visit the website ,click the button and then you can get the data from the page.

Edit: Only get the table value

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.support.ui.Select;

public class GetData {

    public static void main(String args[]) throws InterruptedException {
        WebDriver driver = new FirefoxDriver();
        driver.get("http://myservices.ect.nl/tracing/objectstatus/Pages/Overview.aspx");
        Thread.sleep(5000);
        // select barge
        new Select(driver.findElement(By.id("ctl00_ctl15_g_ce17bd4b_3803_47f6_822a_2b8dd10fc67d_ctl00_dlModality"))).selectByVisibleText("Barge");
        // click button
        Thread.sleep(3000);
        driver.findElement(By.className("button80")).click();
        Thread.sleep(5000);

        //get only table text
        WebElement findElement = driver.findElement(By.className("grid-view"));
        String htmlTableText = findElement.getText();
        // do whatever you want now, These are raw table values.
        System.out.println(htmlTableText);

        driver.close();
        driver.quit();    
    }
}

Upvotes: 3

jpkroehling
jpkroehling

Reputation: 14061

Every "click" (or any interaction of that sort) is a request to the server and a response to the browser. So, a possible solution is not to use JSoup for the initial page, but for the result page. For instance, open a POST to the page that returns the table, passing the parameter responsible for returning the modality Barge. You can use a tool like Firebug (for Firefox) or Chrome Developer Tools to check what's the conversation (request/response), so that you can emulate that with your own code.

Upvotes: 2

XZen
XZen

Reputation: 445

Maybe browser emulator for java will be useful for your problem - please consider this one - HtmlUnit.

It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.

HTMLUnit

Upvotes: 0

Related Questions