scotti8
scotti8

Reputation: 38

How to give a session from requests to requests_html?

I want to scrape data from a site with login. I used the requests libary to login but i dont get js data from there. So I use also requests_html to get the js data but now i cant give the session from request to request_html or take the active session to scrape.

I know that there is "selenium" but when I use it there is always a recaptcha on the page, so I decided to use request_html. If there are other possibilities, which might be easier, I will gladly accept suggestions.

Here is my code:

from requests_html import HTMLSession
import requests

url='...'
url2='...'

headers = {
...
}

data = {
  '_csrf': '...',
  'User[username]': '...',
  'User[password]': '...'
}
session = requests.Session()

session.post(url,headers=headers,data=data)
session = HTMLSession()
r = session.get(url2)

r.html.render()

print(r.html.html)

Upvotes: 2

Views: 3858

Answers (2)

Andres R
Andres R

Reputation: 155

payload = {
    'username': 'admin',
    'password': 'password',
    'Login': 'Login'
}

with HTMLSession() as c:
    r = c.get(url)
    login_html = BeautifulSoup(
        r.html.html, "html.parser")
    csrf_token_name = None
    csrf_token_value = None
    for tag in login_html.find_all('input'):
        if tag.attrs['type'] == 'hidden':
            csrf_token = True
            csrf_token_name = tag.attrs['name']
            csrf_token_value = tag.attrs['value']

    payload[csrf_token_name] = csrf_token_value
    p = c.post(url, data=payload)
    r = c.get(
        'http://localhost/vulnerabilities/xss_r/?name=xx1xx')
    if 'Reflected' in r.text:
        print('test')

Upvotes: 0

crissal
crissal

Reputation: 2647

Why don't you use requests_html.HTMLSession as session object, instead of requests.Session?

It inherits from requests.Session, so it's perfectly capable of calling HTML methods like post.

Upvotes: 2

Related Questions