Crawl data from a website using python

Question

I would like to crawl some data from a website. To manually access the target data, I need to log in and then click on some buttons on to finally get the target html page. Currently, I am using the Python request library to simulate this process. I am doing like this:

ss = requests.session()
#log in
resp = ss.post(url, data = (('username', 'xxx'), ('password', 'xxx')))
#then send requests to the target url
result = ss.get(taraget_url)

However, I found that the final request did not return me what I want.

So I changed the method. I download all the network traffic and look into the headers and cookies of the last request. I found that here are some contents that are different in each log in session like the sessionid and some other variables. So I traces back when these varibales are returned in the response and then get the values again by sending the corresponding requests. After this, I construct the correct headers and cookies and then send request like this:

resp = ss.get(target_url, headers = myheader, cookies = mycookie)

But still, it does not return me anything. Anyone can help?

Crawl data from a website using python

Answers (1)

Related Questions