user702432
user702432

Reputation: 12178

Scrape hyperlinks from an html page

I am trying to extract the latitudes and longitudes for the places listed on the right side of this page. I want to create a table like the following:

Place  Latitude Longitude
Agarda 23.12604 87.19869 
Ahanda 23.13099 87.18501 
.....
.....
West-Sanabandh 23.24876 86.99941 

Is it possible to do this in R without calling up the individual hyperlinks for "Agarda:, "Ahanda"... etc. one at a time?

Upvotes: 0

Views: 137

Answers (2)

user1141165
user1141165

Reputation: 11

It's possible to use RCurl to scrape each page in some type of loop or sapply. If you combine it with some regex and/or readHTMLTable (to identify the hyperlinks) then it's a relatively straightforward function.

Within RCurl, it's possible to create a multicurl which will do this in parallel, although given the number of queries involved, it might be just as easy to serialise it and put a small system sleep between queries.

Upvotes: 1

Quentin
Quentin

Reputation: 943500

The data appears on different pages. You can't get that data without requesting each page.

If R supports threads then you can call them up in parallel rather than one at a time.

Upvotes: 3

Related Questions