pythoncoder
pythoncoder

Reputation: 51

Python Web automation: HTTP Requests OR Headless browser

i am confused on this particular topic, i built a bot for two different websites making use of python's requests module to manually simulate the sending of HTTP PoST and GET requests.

I implemented socks proxies and also used user agents in my requests as well as referrer URL;s when neccesary (i verified actual requests sent by a browser when on these sites using burpsuite) in order to make it look genuine.

However, any accounts i run through my bots keep getting suspended. It got me wondering what i'm doing wrong, a friend suggested that maybe i should use one of these headless solutions(phantomJS) and i am leaning towards that route but i am still confused and would like to know what the difference is between using HTTP requests module and using headless browser like phantomJS.

I am not sure if there is any need to paste my source code here. Just looking for some direction on this project. thank you for taking your time to read such a long wall of text :)

Upvotes: 2

Views: 3569

Answers (1)

Federico Rubbi
Federico Rubbi

Reputation: 734

Probably, you have to set cookies.

To make your requests more genuine, you should set other headers such as Host and Referer. However, the Cookies header should change every time. You can get them in this way:

from requests import Session

with Session() as session:
    # Send request to get cookies.
    response = session.get('your_url', headers=your_headers, proxies=proxies)  # eventually add params keyword
    cookies = response.cookies.get_dict()

    response = session.get('your_url', headers=your_headers, cookies=cookies, proxy=proxy)

Or maybe, the site is scanning for bots in some way.

In this case, you could try to add a delay between requests with time.sleep(). You can see timings in Dev Tools on your browser. Alternatively, you could emulate all the requests you send when you connect to the site on your browser, such as ajax scripts, etc.

In my experience, using requests or using Selenium webdrivers doesn't make much difference in terms of detection, because you can't access headers and even request and response data. Also, note that Phantom Js is no longer supported. It's preferred to use headless Chrome instead.

If none of requests approach doesn't work, I suggest using Selenium-wire or Mobilenium, modified versions of Selenium, that allow accessing requests and response data.

Hope it helps.

Upvotes: 2

Related Questions