Reputation: 442
I need to scrap data from web site on weekly basis. Data is visible only after click on the page(js function is called). Data is loaded in a table(which can be found by id). This script will be run on a server without browser support. This is my code with geb:
@Grab("org.gebish:geb-core:0.13.1")
@Grab("org.seleniumhq.selenium:selenium-firefox-driver:2.52.0")
@Grab("org.seleniumhq.selenium:selenium-support:2.52.0")
@GrabExclude('org.codehaus.groovy:groovy-all')
import geb.Browser
Browser.drive{
// driver.webClient.javaScriptEnabled = true
go "mysite"
js.loadWeekData()
println $("div.data-listing").text()
}
I've searched a lot on this topic but nothing was working as headless scraping with js support. This is the record from Selenium IDE:
driver.findElement(By.linkText("Next")).click();
I was not able to make phantomJS to work with geb.
Edit 1 This is the error from phantom js: java.lang.NoClassDefFoundError: org/openqa/selenium/browserlaunchers/Proxies I've read about the problem with versions but I was not able to resolve it.
@Grab("org.gebish:geb-core:0.13.1")
@Grab("org.seleniumhq.selenium:selenium-firefox-driver:2.52.0")
@Grab("org.seleniumhq.selenium:selenium-support:2.52.0")
@Grab("com.codeborne:phantomjsdriver:1.3.0")
WebDriver driver = new PhantomJSDriver();
// Load Google.com
driver.get("http://www.google.com");
// Locate the Search field on the Google page
WebElement element = driver.findElement(By.name("q"));
In short I need to perform the first script in headless mode(if possible without installing Xvfb). Preferably groovy or java solution.
Upvotes: 0
Views: 643
Reputation: 442
Finally I'll use HTMLUNIT and code like this:
This code needs some cleaning but in general is working. Main problem of HTMLUNIT - warnings and errors is solved by logging settings for stop.
@Grab(group='net.sourceforge.htmlunit', module='htmlunit', version='2.21')
import com.gargoylesoftware.htmlunit.AlertHandler;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.Page;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlButton;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import java.util.logging.Level;
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
WebClient webClient = new WebClient();
webClient.waitForBackgroundJavaScriptStartingBefore(10000);
HtmlPage currentPage = webClient.getPage("mysite");
/* HtmlButton button = (HtmlButton) currentPage.getElementById("tomorrow");
button.click();*/
//String javaScriptCode = "loadTomorrowTrain();";
String javaScriptCode = "loadYesterdayTrain();";
def result = currentPage.executeJavaScript(javaScriptCode);
//def result = page.executeJavaScript(javaScriptCode);
webClient.waitForBackgroundJavaScriptStartingBefore(10000);
println result.getJavaScriptResult();
println "result: "+ result
def newpage = result.getNewPage()
def table = result.getNewPage().getElementById("training-days");
println table
def spans = currentPage.getByXPath( "//div[@training-days]");
println spans
def spans1 = newpage.getByXPath("//div[@class='training-days']//a");
println spans1
def spans2 = currentPage.getByXPath("//div[@class='training-days']//a");
println spans2
def spans3 = currentPage.getByXPath("//table[@id='training']");
println spans3
Upvotes: 0