northtree
northtree

Reputation: 9265

How to render JS to generate fingerprint for cookie?

This website uses JS to set cookie.

How could I run the JS to mock as browser to avoid 429 error?

from requests_html import HTMLSession

with HTMLSession() as s:
  url = 'https://www.realestate.com.au/auction-results/nsw'
  r = s.get(url)
  print(r.status_code)
  print(r.text)

  r.html.render()
  print(r.text)

Upvotes: 2

Views: 1933

Answers (1)

manwithfewneeds
manwithfewneeds

Reputation: 1167

It appears it's nearly impossible to get around the fingerprint without some form of browser simulation (and even still, using seleniumm, I had to set some options). Here's what I came up with using Selenium to get the only critical piece of info needed to make requests (a cookie named 'FGJK') which is sent in the subsequent request headers, and async to grab all the suburb result pages.

from requests_html import AsyncHTMLSession
import asyncio
from selenium import webdriver
import nest_asyncio

#I'm using IPython which doesn't like async unless the following is applied:
nest_asyncio.apply()

async def get_token():
    options = webdriver.ChromeOptions()
    options.add_experimental_option('excludeSwitches', ['enable-automation']) 
    driver = webdriver.Chrome(options=options)
    driver.get('https://www.realestate.com.au/auction-results/nsw')
    cookies = driver.get_cookies()
    while True:
        for cookie in cookies:
            if cookie['name'] == 'FGJK':
               token = cookie['value'] 
               return token         
            else:
                cookies = driver.get_cookies()


async def get_results(s, endpoint, headers):
    r = await s.get(f'https://www.realestate.com.au/auction-results/{endpoint}', headers=headers)
    #do something with r.html
    print(r, endpoint)


async def main():
    token = await get_token()
    s = AsyncHTMLSession()
    headers = {'Cookie': f'FGJK={token}',
               'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}    

    r = await s.get(f'https://sales-events-api.realestate.com.au/sales-events/nsw')
    suburbs = r.json()['data']['suburbResults']
    endpoints = [burb['suburb']['urlValue'] for burb in suburbs]    
    asyncio.gather(*(get_results(s, endpoint, headers) for endpoint in endpoints))


asyncio.run(main()) 

Upvotes: 1

Related Questions