h.l.m
h.l.m

Reputation: 13465

R parsing html to get a value

I would like to get the flood risk associated with any given post code, and would like to use this website to do so. And then by clicking on text-only.

I would like to query this website by giving it a postcode and getting a TRUE or FALSE response, based on the if the response comes back as Yes or No.

Below is the code I have written so far to try and produce this...

but my htmlresp_content object has a class of "HTMLInternalDocument" "HTMLInternalDocument" "XMLInternalDocument" "XMLAbstractDocument" which I do not know how to parse/extract the relevant information from....

 postcode_flood_risk <- function(PC){

    require(httr)

    htmlresp <- GET(paste0('http://maps.environment-agency.gov.uk/wiyby/wiybyController?value=',
                           gsub(' ','+',PC),
                           '&submit.x=-1&submit.y=11&submit=Search%09&lang=_e&ep=summary&topic=floodmap&layerGroups=default&scale=9&textonly=off'))

    htmlresp_content <- content(htmlresp)

    # code to extract the 'Yes' or 'No' from htmlresp_content
    # for now automatically choose yes
    flood_risk <- 'Yes'

    if(flood_risk=='Yes'){
      TRUE
    } else {
      FALSE
    }
  }

Upvotes: 0

Views: 91

Answers (1)

Rorschach
Rorschach

Reputation: 32426

You can add in some xpath to get the response

postcode_flood_risk <- function(PC){
    require(httr)
    htmlresp <- GET(paste0('http://maps.environment-agency.gov.uk/wiyby/wiybyController?value=',
                           gsub(' ','+',PC),
                           '&submit.x=-1&submit.y=11&submit=Search%09&lang=_e&ep=summary&topic=floodmap&layerGroups=default&scale=9&textonly=off'))

    htmlresp_content <- content(htmlresp)

    # extract the 'Yes'
    out <- htmlresp_content["//table[2]//td[2]//text()"]
    flood_risk <- gsub("\\t|\\r|\\n", "", xmlValue(out[[1]]))

    if(!is.na(flood_risk) && flood_risk=='Yes'){
        TRUE
    } else {
        FALSE
    }
}

postcode_flood_risk("FY6 0AA")
# TRUE
postcode_flood_risk("FY6 0A9")
# FALSE

Upvotes: 1

Related Questions