Reputation: 11
I am trying to use R to trigger a file download on this site: http://www.regulomedb.org. Basically, an ID, e.g., rs33914668, is input in the form, click Submit. Then in the new page, click Download in the bottom left corner to trigger a file download.
I have tried rvest with the help from other posts.
library(httr)
library(rvest)
library(tidyverse)
pre_pg <- read_html("http://www.regulomedb.org")
POST(
url = "http://www.regulomedb.org",
body = list(
data = "rs33914668"
),
encode = "form"
)
) -> res
pg <- content(res, as="parsed")
By checking pg, I think I am still on the first page, not the http://www.regulomedb.org/results. (I don't know how to check pg list other than reading it line by line). So, I cannot reach the download button. I cannot figure out why it cannot jump to the next page.
By learning from some other posts, I managed to download the file without using rvest.
library(httr)
library(rvest)
library(RCurl)
session <- html_session("http://www.regulomedb.org")
form <- html_form(session)[[1]]
filledform <- set_values(form, `data` = "rs33914668")
session2 <- submit_form(session, filledform)
form2 <- html_form(session2)[[1]]
filledform2 <- set_values(form2)
thesid <- filledform2[["fields"]][["sid"]]$value
theurl <- paste0('http://www.regulomedb.org/download/',thesid)
download.file(theurl,destfile="test.bed",method="libcurl")
In filledform2, I found the sid. Using www.regulomedb.org/download/:sid, I can download the file.
I am new to html or even R, and don't even know what sid is. Although I made it, I am not satisfied with the coding. So, I hope some experienced users can provide better, alternative solutions, or improve my current solution. Also, what is wrong with the POST/rvest method?
Upvotes: 1
Views: 1366
Reputation: 1618
url<-"http://www.regulomedb.org/"
library(rvest)
page<-html_session(url)
download_page<-rvest:::request_POST(page,url="http://www.regulomedb.org/results",
body=list("data"="rs33914668"),
encode = 'form')
#This is a unique id on generated based on your query
sid<-html_nodes(download_page,css='#download > input[type="hidden"]:nth-child(8)') %>% html_attr('value')
#This is a UNIX time
download_token<-as.numeric(as.POSIXct(Sys.time()))
download_page1<-rvest:::request_POST(download_page,url="http://www.regulomedb.org/download",
body=list("format"="bed",
"sid"=sid,
"download_token_value_id"=download_token ),
encode = 'form')
writeBin(download_page1$response$content, "regulomedb_result.bed")
Upvotes: 2