Reputation: 38
I want to scrape data from a site with login. I used the requests libary to login but i dont get js data from there. So I use also requests_html to get the js data but now i cant give the session from request to request_html or take the active session to scrape.
I know that there is "selenium" but when I use it there is always a recaptcha on the page, so I decided to use request_html. If there are other possibilities, which might be easier, I will gladly accept suggestions.
Here is my code:
from requests_html import HTMLSession
import requests
url='...'
url2='...'
headers = {
...
}
data = {
'_csrf': '...',
'User[username]': '...',
'User[password]': '...'
}
session = requests.Session()
session.post(url,headers=headers,data=data)
session = HTMLSession()
r = session.get(url2)
r.html.render()
print(r.html.html)
Upvotes: 2
Views: 3858
Reputation: 155
payload = {
'username': 'admin',
'password': 'password',
'Login': 'Login'
}
with HTMLSession() as c:
r = c.get(url)
login_html = BeautifulSoup(
r.html.html, "html.parser")
csrf_token_name = None
csrf_token_value = None
for tag in login_html.find_all('input'):
if tag.attrs['type'] == 'hidden':
csrf_token = True
csrf_token_name = tag.attrs['name']
csrf_token_value = tag.attrs['value']
payload[csrf_token_name] = csrf_token_value
p = c.post(url, data=payload)
r = c.get(
'http://localhost/vulnerabilities/xss_r/?name=xx1xx')
if 'Reflected' in r.text:
print('test')
Upvotes: 0
Reputation: 2647
Why don't you use requests_html.HTMLSession
as session object, instead of requests.Session
?
It inherits from requests.Session
, so it's perfectly capable of calling HTML methods like post
.
Upvotes: 2