Reputation: 335
How can I check my session cookies and specify those cookies before making a subsequent web request?
I want to scrape a page but I cannot submit the cookies.
I'm using the rvest library.
My code:
library(rvest)
WP <- html_session("http://www.wp.pl/")
headers <- httr::headers(WP)
cookies <- unlist(headers[names(headers) == "set-cookie"])
crumbs <- stringr::str_split_fixed(cookies, "; ", 4)
# method 1
stringr::str_split_fixed(crumbs[, 1], "=", 2)
# method 2
cookies(WP)
How do I set my cookies to do the web scraping?
Upvotes: 4
Views: 2817
Reputation: 15373
Here's some code that'll do the trick:
library(httr)
library(rvest)
httr::GET("http://www.wp.pl/",
set_cookies(`_SMIDA` = "7cf9ea4bfadb60bbd0950e2f8f4c279d",
`__utma` = "29983421.138599299.1413649536.1413649536.1413649536.1",
`__utmb` = "29983421.5.10.1413649536",
`__utmc` = "29983421",
`__utmt` = "1",
`__utmz` = "29983421.1413649536.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)")) %>%
read_html %>% # Sample rvest code
read_table(fill=TRUE) # Sample rvest code
Upvotes: 2