Reputation: 23
(I'm sorry for my english i'll try to do my best) :
I'm a newbie in python and i'm seeking for help for some web scraping. I already have a functionable code to get the links i want but the website is protected by a password. with the help of a lot of question i read i managed to get a working code to scrape the website after the login but the links i want are on another page :
the login page is http://fantasy.trashtalk.co/login.php
the landing page (the one i scrape with this code) after login is http://fantasy.trashtalk.co/
and the page i want is http://fantasy.trashtalk.co/?tpl=classement&t=1
So i have this code (some import are probably useless, they come from another code):
from bs4 import BeautifulSoup
import requests
from lxml import html
import urllib.request
import re
username = 'myusername'
password = 'mypass'
url = "http://fantasy.trashtalk.co/?tpl=classement&t=1"
log = "http://fantasy.trashtalk.co/login.php"
values = {'email': username,
'password': password}
r = requests.post(log, data=values)
# Not sure about the code below but it works.
data = r.text
soup = BeautifulSoup(data, 'lxml')
tags = soup.find_all('a')
for link in soup.findAll('a', attrs={'href': re.compile("^https://")}):
print(link.get('href'))
I understand that this code only allow me to access to the login page then scrape what come next (the landing page) but i don't figure out how to "save" my loggin info to access the page i want to scrape.
i think i should add something like this after the login code but when i do it it only scrape my links from the login page :
s = request.get(url)
Also i read some topic here using "with session" thing ? but i didn't managed to make it work.
Any of help would be appreciated. Thank you for your time.
Upvotes: 2
Views: 3809
Reputation: 1071
The issue was that you needed to save your login credentials by posting them through a session object, not a request object. I've modified your code below and you now have access to the html tags located in the scrape_url
page. Good luck!
import requests
from bs4 import BeautifulSoup
username = 'email'
password = 'password'
scrape_url = 'http://fantasy.trashtalk.co/?tpl=classement&t=1'
login_url = 'http://fantasy.trashtalk.co/login.php'
login_info = {'email': username,'password': password}
#Start session.
session = requests.session()
#Login using your authentication information.
session.post(url=login_url, data=login_info)
#Request page you want to scrape.
url = session.get(url=scrape_url)
soup = BeautifulSoup(url.content, 'html.parser')
for link in soup.findAll('a'):
print('\nLink href: ' + link['href'])
print('Link text: ' + link.text)
Upvotes: 3