Patrick Balada
Patrick Balada

Reputation: 1450

How can I follow any redirections of a url in R?

Suppose I have the following url:

http://linkinghub.elsevier.com/retrieve/pii/S1755534516300379

When entering this into my standard desktop browser, I get redirected to:

http://www.sciencedirect.com/science/article/pii/S1755534516300379?via%3Dihub

However, I am not able to implement this in R. I tried the packages httr and RCurl. In the documentation of httr, it says the function GET used as follows:

library(httr)
GET("http://linkinghub.elsevier.com/retrieve/pii/S1755534516300379")

is supposed to lead to the actual url used (after any redirects). But when calling the url:

GET("http://linkinghub.elsevier.com/retrieve/pii/S1755534516300379")$url

I don't get the final redirection. I would very much appreciate your help!

Upvotes: 3

Views: 2038

Answers (2)

Tal Galili
Tal Galili

Reputation: 25336

For future reference, here is a little code snippet I wrote to follow a redirect using HEAD (instead of GET, so to not download more than needed). It will not work for the question at hand, but may help people in the future (with simpler scenarios).

# FUNCTIONS
url_after_redirect_1 <- function(url) {
  library(httr)
  a <- HEAD(url)
  # headers(a)
  (a$all_headers[[2]])$headers$location  
}
url_after_redirect <- Vectorize(url_after_redirect_1)

Upvotes: 5

snaut
snaut

Reputation: 2535

The redirection at this site works with javascript, not http. So the redirection will not work unless you interpret the content of the downloaded document.

If you want to parse many documents from the same site you could parse the redirection url directly from the document.

If you want to parse many different sites with different redirection mechanisms, you will need some library that actually loads the site and runs the javascript, for example RSelenium.

Upvotes: 3

Related Questions