Reputation: 77
I am trying to scrape the prices and the dates in the table in the attached picture from the URL: **** http://www.airfrance.fr/vols/paris+tunis
I succeeded to scrape informations but not the ways i'm looking for ( date + price). I used these lines of code
import java.io.IOException;
import javax.lang.model.element.Element;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class Test {
public static void main(String[] args) {
Document doc;
try {
doc = Jsoup.connect("http://www.airfrance.fr/vols/paris+tunis").get();
Elements links = doc.select("div");
for (org.jsoup.nodes.Element e:links) {
System.out.println(e.text());
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Running this code gives me just some prices and anly a few dates but not all the table as it shown in the picture below.
Can you help me please to resolve this problem for my study project and thanks.
Upvotes: 2
Views: 254
Reputation: 4037
The problem is the calendar you are parsing is not in the original source code (right click > view source) as delivered from the server. That table is generated using JavaScript when the page is rendered by the browser (right click > inspect).
Jsoup can only parse source code. So you need to load the page first with something like HtmlUnit, then pass this rendered paged to Jsoup.
// load page using HTML Unit and fire scripts
WebClient webClient = new WebClient();
HtmlPage myPage = webClient.getPage("http://www.airfrance.fr/vols/paris+tunis");
// convert page to generated HTML and convert to document
Document doc = Jsoup.parse(myPage.asXml());
// find all of the date/price cells
for(Element cell : doc.select("td.available.daySelection")) {
String cellDate = cell.select(".cellDate").text();
String cellPrice = cell.select(".cellPrice > .day_price").text();
System.out.println(
String.format(
"cellDate=%s cellPrice=%s",
cellDate,
cellPrice));
}
// clean up resources
webClient.close();
Console
cellDate=1 septembre cellPrice=302 €
cellDate=2 septembre cellPrice=270 €
cellDate=3 septembre cellPrice=270 €
cellDate=4 septembre cellPrice=270 €
cellDate=5 septembre cellPrice=270 €
....
Source: Parsing JavaScript Generated Pages
Upvotes: 1