user3334472
user3334472

Reputation: 149

Parse webpage and convert to data.frame

I am trying to scrape material from this website: http://www.appliedsolutions.org/site/308/Local-Government/Local-Government-Affiliates

Specifically I am interested in extracting the values from the javascript that appear around line 598 of the page source code:

 {
            "title": 'Coconino County',
            "lat": '35.7714',
            "lng": '-111.5111',
            "description": 'Coconino County, AZ <br/> <a href="http://www.coconino.az.gov/" target="_blank"> http://www.coconino.az.gov/</a> <br/>  '
        }

    ,

         {
            "title": 'City of Flagstaff',
            "lat": '35.1981',
            "lng": '-111.6506',
            "description": 'City of Flagstaff, AZ <br/> <a href="http://www.flagstaff.az.gov/   " target="_blank"> http://www.flagstaff.az.gov/   </a> <br/>  '
        }

Ideally I would like to bring the "title", "lat", and "lng" values into a R data.frame.

I have used the readLines function in R to read the page, but am having trouble reducing the html to isolate the data I need.

Upvotes: 3

Views: 387

Answers (1)

Rorschach
Rorschach

Reputation: 32446

This is one way using RSelenium package.

## Get RSelenium going and navigate to page, retrieve source
require(RSelenium)
RSelenium::checkForServer()
RSelenium::startServer()
remDr <- remoteDriver()
remDr$open()
remDr$setImplicitWaitTimeout(3000)
remDr$navigate("http://www.appliedsolutions.org/site/308/Local-Government/Local-Government-Affiliates")

EDIT: this is MUCH simpler per @jdharrison suggestions

appData <- remDr$executeScript("return markers;")

dat <- do.call(rbind.data.frame, appData)
dat <- dat[,c("title","lat","lng")]

> head(dat)
        lat       lng               title
   35.7714 -111.5111     Coconino County
   35.1981 -111.6506   City of Flagstaff
   34.8697 -111.7603      City of Sedona
   34.6503 -112.4147      Yavapai County
     32.64 -117.0833 City of Chula Vista
   38.8056 -123.0161  City of Cloverdale

Upvotes: 5

Related Questions