Reputation: 950
I am trying to grab tables from a Department of Transportation (DOT) traffic count webpage (South Carolina - US). In this particular case, I am trying to scrape all tables for a particular location (textBox1) and a range of dates (textBox2). I've done some basic webpage scrapping, but this is sincerely beyond my current knowledge.
This is the webpage (add http -- don't know why they don't have a domain for this): 206.74.144.15/Poll5WebAppPublic/wfrm/wfrmViewDataNightly.aspx
The first text box (site) corresponds to the following line:
<input name="WucSites1:txtSiteID" value="0032" maxlength="4" size="1" id="WucSites1_txtSiteID" class="TextBox" style="width:40px;" type="text">
The second textbox (date) corresponds to the following line:
<input name="WucDates1:txtDate" value="08/10/2014" maxlength="10" size="8" id="WucDates1_txtDate" class="TextBox" style="width:88px;" type="text">
The "Go" button corresponds to the following line:
<input name="btnSmGOIcon" id="btnSmGOIcon" tabindex="1" title="Get Data for Chosen Site" class="cssLargeIcons" onmouseover="this.src = '../images/iconImages/GO_o.png';" onmouseout="this.src = '../images/iconImages/GO.png';" src="../images/iconImages/GO.png" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("btnSmGOIcon", "", true, "", "", false, false))" language="javascript" border="0" type="image">
I want to set the location/site to "32", and rotate through the calendar days from 01/01/2013 until 08/11/2014 while picking all the tables and binding them into a single dataframe.
Thank you very much in advance
Upvotes: 1
Views: 505
Reputation: 115390
This is a job that is perfect for RSelenium
Once you have the appropriate software installed
library(RSelenium)
library(XML)
startServer()
d <- remoteDriver$new()
# open and navigate to page
d$open()
d$navigate('http://206.74.144.15/Poll5WebAppPublic/wfrm/wfrmViewDataNightly.aspx')
# set site = 0032
d$findElement('name','WucSites1:txtSiteID')$setElementAttribute('value',list('0032'))
# loop through dates
dates <- strftime(seq.Date(as.Date('01/01/2013', format =
'%m/%d/%Y'), as.Date('08/11/2014', format = '%m/%d/%Y'),by=1),
'%m/%d/%Y')
results <- lapply(dates, function(i,dr){
ii <- force(i)
# change date
dr$findElement('name','WucDates1:txtDate')$setElementAttribute('value',list(ii))
# click go
dr$findElement('name','btnSmGOIcon')$clickElement()
# extract table (the first three lines are header)
data <- readHTMLTable(dr$findElement('id', 'gridForData')$getElementAttribute('outerHTML')[[1]],
skip=1:3)
data$date = ii
return(data)
},dr = d)
Upvotes: 3