Reputation: 107
I need to login to this site http://bit.do for scraping purpose. Data are protected by password but I can't figure out how to log in the access them in R.
I tried
library (rvest)
url <-"http://bit.d o/#login/admin"
pgsession <-html_session(url)
pgform <-html_form(pgsession)[[1]]
filled_form <- set_values(pgform,
'username' = "test0001",
'password' = "qwerty1234")
submit_form(pgsession,filled_form)
url <- 'http://bit.d o/admin/url/http%3A||2F||2Fedition.cnn.com||2F2017||2F07||2F21||2Fopinions||2Ftrump-russia-putin-lain-opinion||2Findex.html'
data_page <- read_html(url)
data_link<- html_nodes(data_page,'td > a')
data_click<- html_nodes(data_page,'td span:nth-child(1)')
but I get this kind of error
Submitting with 'NULL'
Error in xml2::url_absolute(form$url, session$url) :
Not compatible with STRSXP: [type=NULL].
How could I do? These are my testing credential username: test0001, password: qwerty1234. Here's an example of protected data I want to scrape http://bit.d o/admin/url/http%3A||2F||2Fedition.cnn.com||2F2017||2F07||2F21||2Fopinions||2Ftrump-russia-putin-lain-opinion||2Findex.html
IMPORTANT: NOTE THAT DUE TO A StackOverflow RESTRICTION I PUT A SPACE BETWEEN the d and o in domain name
Upvotes: 0
Views: 281
Reputation: 2826
Since the form has no url
field, when you call submit_form(pgsession, filled_form)
a call to xml2::url_absolute(form$url, session$url)
takes place that doesn't work because form$url
is NULL
. In order to get past this, you need to give a value – even if it is empty – to the form$url
that is called by url_absolute
. Try adding the following line after you populate the filled_form
with set_values
:
filled_form$url <- ''
Upvotes: 1