Taha Ben MOHAMED
Taha Ben MOHAMED

Reputation: 77

Can't scrape the data that i'm looking for?

I am trying to scrape the prices and the dates in the table in the attached picture from the URL: **** http://www.airfrance.fr/vols/paris+tunis

I succeeded to scrape informations but not the ways i'm looking for ( date + price). I used these lines of code

import java.io.IOException;

import javax.lang.model.element.Element;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

public class Test {
    public static void main(String[] args) {
        Document doc;
        try {
            doc = Jsoup.connect("http://www.airfrance.fr/vols/paris+tunis").get();
            Elements links = doc.select("div");
            for (org.jsoup.nodes.Element e:links) {
                System.out.println(e.text());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Running this code gives me just some prices and anly a few dates but not all the table as it shown in the picture below.

enter image description here

Can you help me please to resolve this problem for my study project and thanks.

Upvotes: 2

Views: 254

Answers (1)

Zack
Zack

Reputation: 4037

The problem is the calendar you are parsing is not in the original source code (right click > view source) as delivered from the server. That table is generated using JavaScript when the page is rendered by the browser (right click > inspect).

Jsoup can only parse source code. So you need to load the page first with something like HtmlUnit, then pass this rendered paged to Jsoup.

// load page using HTML Unit and fire scripts
WebClient webClient = new WebClient();
HtmlPage myPage = webClient.getPage("http://www.airfrance.fr/vols/paris+tunis");

// convert page to generated HTML and convert to document
Document doc = Jsoup.parse(myPage.asXml());

// find all of the date/price cells
for(Element cell : doc.select("td.available.daySelection")) {
    String cellDate = cell.select(".cellDate").text();
    String cellPrice = cell.select(".cellPrice > .day_price").text();
    System.out.println(
            String.format(
                    "cellDate=%s cellPrice=%s", 
                    cellDate, 
                    cellPrice));
}

// clean up resources        
webClient.close();

Console

cellDate=1 septembre cellPrice=302 €
cellDate=2 septembre cellPrice=270 €
cellDate=3 septembre cellPrice=270 €
cellDate=4 septembre cellPrice=270 €
cellDate=5 septembre cellPrice=270 €
....

Source: Parsing JavaScript Generated Pages

Upvotes: 1

Related Questions