Reputation: 2748

Keep session when using requests_html's render function

I have a small internal webpage that requires a log in. When logged in, a simple HTML page is loaded, and there are javascript scripts that load the actual content of the pages.

I want to:

Log into the page
Run the javascript
Extract information from the page
Find links in the page and repeat the procedure

I found that there is a package called requests_html that sounds like the goal is to be able to do something like this. I managed to use requests_html to log into the page and get the HTML view of the page I want. It should then be possible to call

response.html.render()

and requests_html should then use pyppeteer, that downloads and launches a headless chromium, loads the webpage, renders the page, and then returns back the result. This actually works, but it only returns the log in page. The session information from requests_html is not passed to pyppeteer and/or chromium.

Is it possible to use the same session, or do I need to try to log in using only pyppeteer?

Here is a code example, but you need a small webpage with form login and javascript rendering to try it on:

from requests_html import HTMLSession
from lxml import html

url = "https://example.com"
username = "[email protected]"
password = "hunter2"
session = HTMLSession()
payload = {
    "input_user": username,
    "input_password": password
}
response = session.post(url, data=payload)
# Logged in here
response = session.get(url)
response.html.render()

# Output from this shows login page
print(response.html.html)

Upvotes: 1

Answers (2)

Andres R

Reputation: 155

The download of the github version (I suppose a less stable version) is not required. You can specify reload=False as follows:

response.html.render(reload=False)

Just saw that this is from 2019... I guess better late than never and yes that is what she said ;-)

Upvotes: 2

virantha

Reputation: 11

You can install the github version of requests-html and use the following parameter to render():

response.html.render(send_cookies_session=True)

This will maintain your login authorization from your session in the Chromium page instance used to render.

Upvotes: 1

Keep session when using requests_html&#39;s render function

Answers (2)

Related Questions

Keep session when using requests_html's render function