rocket_c
rocket_c

Reputation: 11

How to trigger a file download using R

I am trying to use R to trigger a file download on this site: http://www.regulomedb.org. Basically, an ID, e.g., rs33914668, is input in the form, click Submit. Then in the new page, click Download in the bottom left corner to trigger a file download.

I have tried rvest with the help from other posts.

library(httr)
library(rvest)
library(tidyverse)

pre_pg <- read_html("http://www.regulomedb.org")
POST(
  url = "http://www.regulomedb.org", 
  body = list(
    data = "rs33914668"
  ),
  encode =  "form"
  )
) -> res
pg <- content(res, as="parsed")

By checking pg, I think I am still on the first page, not the http://www.regulomedb.org/results. (I don't know how to check pg list other than reading it line by line). So, I cannot reach the download button. I cannot figure out why it cannot jump to the next page.

By learning from some other posts, I managed to download the file without using rvest.

library(httr)
library(rvest)
library(RCurl)

session <- html_session("http://www.regulomedb.org")
form <- html_form(session)[[1]]
filledform <- set_values(form, `data` = "rs33914668")

session2 <- submit_form(session, filledform)
form2 <- html_form(session2)[[1]]
filledform2 <- set_values(form2)

thesid <- filledform2[["fields"]][["sid"]]$value
theurl <- paste0('http://www.regulomedb.org/download/',thesid)
download.file(theurl,destfile="test.bed",method="libcurl")

In filledform2, I found the sid. Using www.regulomedb.org/download/:sid, I can download the file.

I am new to html or even R, and don't even know what sid is. Although I made it, I am not satisfied with the coding. So, I hope some experienced users can provide better, alternative solutions, or improve my current solution. Also, what is wrong with the POST/rvest method?

Upvotes: 1

Views: 1366

Answers (1)

Bharath
Bharath

Reputation: 1618

url<-"http://www.regulomedb.org/"
library(rvest)
page<-html_session(url)

download_page<-rvest:::request_POST(page,url="http://www.regulomedb.org/results",
                                    body=list("data"="rs33914668"),
                                    encode = 'form')

#This is a unique id on generated based on your query
sid<-html_nodes(download_page,css='#download > input[type="hidden"]:nth-child(8)') %>% html_attr('value')

#This is a UNIX time
download_token<-as.numeric(as.POSIXct(Sys.time()))

download_page1<-rvest:::request_POST(download_page,url="http://www.regulomedb.org/download",
                                    body=list("format"="bed",
                                              "sid"=sid,
                                              "download_token_value_id"=download_token ),
                                    encode = 'form')
writeBin(download_page1$response$content, "regulomedb_result.bed")

Upvotes: 2

Related Questions