Jose R
Jose R

Reputation: 950

R: Webpage scraping with JavaScript (need to give two values and grab tables)

I am trying to grab tables from a Department of Transportation (DOT) traffic count webpage (South Carolina - US). In this particular case, I am trying to scrape all tables for a particular location (textBox1) and a range of dates (textBox2). I've done some basic webpage scrapping, but this is sincerely beyond my current knowledge.

This is the webpage (add http -- don't know why they don't have a domain for this): 206.74.144.15/Poll5WebAppPublic/wfrm/wfrmViewDataNightly.aspx

The first text box (site) corresponds to the following line:

<input name="WucSites1:txtSiteID" value="0032" maxlength="4" size="1" id="WucSites1_txtSiteID" class="TextBox" style="width:40px;" type="text">

The second textbox (date) corresponds to the following line:

<input name="WucDates1:txtDate" value="08/10/2014" maxlength="10" size="8" id="WucDates1_txtDate" class="TextBox" style="width:88px;" type="text">

The "Go" button corresponds to the following line:

<input name="btnSmGOIcon" id="btnSmGOIcon" tabindex="1" title="Get Data for Chosen Site" class="cssLargeIcons" onmouseover="this.src = '../images/iconImages/GO_o.png';" onmouseout="this.src = '../images/iconImages/GO.png';" src="../images/iconImages/GO.png" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;btnSmGOIcon&quot;, &quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, false))" language="javascript" border="0" type="image">

I want to set the location/site to "32", and rotate through the calendar days from 01/01/2013 until 08/11/2014 while picking all the tables and binding them into a single dataframe.

Thank you very much in advance

Upvotes: 1

Views: 505

Answers (1)

mnel
mnel

Reputation: 115390

This is a job that is perfect for RSelenium

Once you have the appropriate software installed

library(RSelenium)
library(XML)
startServer()
d <- remoteDriver$new()
# open and navigate to page
d$open()
d$navigate('http://206.74.144.15/Poll5WebAppPublic/wfrm/wfrmViewDataNightly.aspx')
# set site = 0032
d$findElement('name','WucSites1:txtSiteID')$setElementAttribute('value',list('0032'))
# loop through dates    
dates <- strftime(seq.Date(as.Date('01/01/2013', format = 
     '%m/%d/%Y'),  as.Date('08/11/2014', format = '%m/%d/%Y'),by=1),
      '%m/%d/%Y')

results <- lapply(dates, function(i,dr){
      ii <- force(i)
      # change date
     dr$findElement('name','WucDates1:txtDate')$setElementAttribute('value',list(ii))
     # click go
     dr$findElement('name','btnSmGOIcon')$clickElement()
     # extract table (the first three lines are header)
     data <- readHTMLTable(dr$findElement('id', 'gridForData')$getElementAttribute('outerHTML')[[1]], 
             skip=1:3)
     data$date = ii
     return(data)
},dr = d)

Upvotes: 3

Related Questions