David Perea
David Perea

Reputation: 149

Error in open.connection(con, "rb") : HTTP error 403. with RSelenium in Firefox

I have been running this code at all times it worked for me, but suddenly it returns the following error:

Error in open.connection(con, "rb") : HTTP error 403

I haven't changed anything and I don't know why it could have happened. Any suggestion? Thank you!

 #Loading the rvest package
    library(rvest)
    library(magrittr) # for the '%>%' pipe symbols
    library(RSelenium) # to get the loaded html of 
    library(purrr) # for 'map_chr' to get reply 
    
    url_google <- list('https://play.google.com/store/apps/details?id=eu.acsi.europa&hl=es&gl=US&showAllReviews=true')
    
    for (apps in url_google) { 
    
      #Specifying the url for desired website to be scraped
      url <- apps
    
      # starting local RSelenium (this is the only way to start RSelenium that is working for me atm)
      selCommand <- wdman::selenium(jvmargs = c("-Dwebdriver.chrome.verboseLogging=true"), retcommand = TRUE)
      shell(selCommand, wait = FALSE, minimized = TRUE)
      remDr <- remoteDriver(port = 4567L, browserName = "firefox")
      remDr$open()
      
      require(RSelenium)
    
      # go to website
      remDr$navigate(url)
    
      
      # get page source and save it as an html object with rvest
      html_obj <- remDr$getPageSource(header = TRUE)[[1]] %>% read_html()
      
      # 1) App name
      app <- html_obj %>% html_nodes(".AHFaub") %>% html_text()
      
      # 2) name field (assuming that with 'name' you refer to the name of the reviewer)
      names <- html_obj %>% html_nodes(".kx8XBd .X43Kjb") %>% html_text()
      

Upvotes: 3

Views: 807

Answers (1)

DanWaters
DanWaters

Reputation: 536

what worked for me was instead of your

remDr <- remoteDriver(port = 4567L, browserName = "firefox")
  remDr$open()

I used

rD <- rsDriver(browser = "firefox",
                 check = FALSE
                 )
  remDr <- rD[["client"]]

The rsDriver command isn't the solution, but the argument check = FALSE. At least for me, this was a curl issue, where it was trying to download new versions of each of the browser drivers and was having an issue. Turning check to FALSE turns that download process off.

Upvotes: 2

Related Questions