GET {httr} returns a Bad Request response

Question

I am trying to scrape html elements of the url stored insearchlink. The only method that worked for me ishtmlTreeParse {XML}.However it's not returning the elements I'm looking for. example:img[@title='Add to compare']

searchlink <- "http://www.realtor.ca/Map.aspx#CultureId=1&ApplicationId=1&RecordsPerPage=9&MaximumResults=9&PropertyTypeId=300&TransactionTypeId=2&SortOrder=A&SortBy=1&LongitudeMin=-114.52066040039104&LongitudeMax=-113.60536193847697&LatitudeMin=50.94776904194829&LatitudeMax=51.14246522072541&PriceMin=0&PriceMax=0&BedRange=0-0&BathRange=0-0&ParkingSpaceRange=0-0&viewState=m&Longitude=-114.063011169434&Latitude=51.0452194213867&ZoomLevel=11&CurrentPage=1" 

doc <- htmlTreeParse(searchlink,useInternalNodes = T)


   classes <- xpathSApply(doc,"//img[@title='Add to compare']",function(x){xmlGetAttr(x,'class')})

the result of running classes above:

list()

I have also tried readLines and GET {httr} but they both returned an error in reading the url. I am guessing it's because of the special characters in the url but don't know how to go about fixing it. Response is given below:

Response [http://www.realtor.ca/Map.aspx#CultureId=1&ApplicationId=1&RecordsPerPage=9&MaximumResults=9&PropertyTypeId=300&TransactionTypeId=2&SortOrder=A&SortBy=1&LongitudeMin=-114.52066040039104&LongitudeMax=-113.60536193847697&LatitudeMin=50.94776904194829&LatitudeMax=51.14246522072541&PriceMin=0&PriceMax=0&BedRange=0-0&BathRange=0-0&ParkingSpaceRange=0-0&viewState=m&Longitude=-114.063011169434&Latitude=51.0452194213867&ZoomLevel=11&CurrentPage=1]
  Date: 2014-12-01 16:46
  Status: 400
  Content-type: text/html; charset=us-ascii
  Size: 324 B

Bad Request

Bad Request - Invalid URL
HTTP Error 400. The request URL is invalid.

sckott · Accepted Answer

Try removing the one # in the url, I just replaced with a ?

library("httr")
url <- "http://www.realtor.ca/Map.aspx?CultureId=1&ApplicationId=1&RecordsPerPage=9&MaximumResults=9&PropertyTypeId=300&TransactionTypeId=2&SortOrder=A&SortBy=1&LongitudeMin=-114.52066040039104&LongitudeMax=-113.60536193847697&LatitudeMin=50.94776904194829&LatitudeMax=51.14246522072541&PriceMin=0&PriceMax=0&BedRange=0-0&BathRange=0-0&ParkingSpaceRange=0-0&viewState=m&Longitude=-114.063011169434&Latitude=51.0452194213867&ZoomLevel=11&CurrentPage=1"
res <- GET(url)
tt <- content(res)

then parse the html content in tt

GET {httr} returns a Bad Request response

Answers (1)

Related Questions