Reputation: 233
I want to web-scraping from a https website, where i have to login to get the informations.
Here my (first part of) code:
import requests
from lxml import html
import urllib2
from bs4 import BeautifulSoup
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
import MySQLdb
url = 'https://www.opten.hu/'
values = {'user': 'MYUSERNAME',
'password': 'MYPASSWORD'}
r = requests.post(url, data=values)
params = {'Category': 6, 'deltreeid': 6, 'do': 'Delete Tree'}
url = 'https://www.opten.hu/cegtar/cegkivonat/0910000511'
result = requests.get(url, data=params, cookies=r.cookies)
print result
If I run it, and print the result i get "Response [200]", so its oke, the server successfully answered the http request.
After i want to navigate an other menu item on this website, where i can find the valuable informations for me. (called url)
How can i scrape this page, what i am wrong in my code?
import requests
from lxml import html
import urllib2
from bs4 import BeautifulSoup
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
import MySQLdb
url = 'https://www.opten.hu/'
values = {'user': 'MYUSERNAME',
'password': 'MYPASSWORD'}
r = requests.post(url, data=values)
params = {'Category': 6, 'deltreeid': 6, 'do': 'Delete Tree'}
url = 'https://www.opten.hu/cegtar/cegkivonat/0910000511'
result = requests.get(url, data=params, cookies=r.cookies)
print result
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
print soup
Upvotes: 2
Views: 725
Reputation: 309
You are using urllib2
to read the content. It will make another request to the url to get the data but will not use the cookies you got in the previous request.
Try the following code. I have used requests.Session
to persist the cookies and you don't need urllib2
now.
# Author: Swapnil Mahajan
import requests
from lxml import html
import urllib2
from bs4 import BeautifulSoup
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
import MySQLdb
url = 'https://www.opten.hu/ousers/loginuser'
values = {'user': 'MYUSERNAME',
'password': 'MYPASSWORD'}
session = requests.Session()
r = session.post(url, data=values)
params = {'Category': 6, 'deltreeid': 6, 'do': 'Delete Tree'}
url = 'https://www.opten.hu/cegtar/cegkivonat/0910000511'
result = session.get(url, data=params)
soup = BeautifulSoup(result.text, "lxml")
print soup
Upvotes: 1