Reputation: 3
I'm trying to scrape data from https://www.realtor.com/realestateagency/ , for example if I type Los Angeles in the search bar and click enter while inspecting the network tab in the dev tools, I can see this request being made which returns a json response:
curl 'https://www.realtor.com/realestateagents/api/v3/search?nar_only=1&offset=&limit=20&marketing_area_cities=CA_Los%20Angeles&postal_code=&is_postal_search=true&name=&types=office&sort=recent_activity_high&far_opt_out=false&client_id=FAR2.0&recommendations_count_min=&agent_rating_min=&languages=&agent_type=&price_min=&price_max=&designations=&photo=true&seoUserType=\{%22isBot%22:false,%22deviceType%22:%22desktop%22\}&is_county_search=false&county=' \
-H 'accept: application/json, text/plain, */*' \
-H 'accept-language: en-US,en;q=0.9' \
-H 'authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3MzAyODQ1NDksInN1YiI6ImZpbmRfYV9yZWFsdG9yIiwiaWF0IjoxNzMwMjg0NTQ0fQ.01rdeP19KMDha2A-KRRzU-qDUeNM7plpSLWrPECW8l4' \
-H 'cookie: __vst=1348d7fe-c4e3-48be-9494-aead34930b9a; __ssn=ad0cc313-5a2f-45dc-9ccc-de85634738ca; __ssnstarttime=1730228077; split=n; split_tcv=173; __bot=false; AMCVS_8853394255142B6A0A4C98A4%40AdobeOrg=1; sbjs_migrations=1418474375998%3D1; sbjs_current_add=fd%3D2024-10-29%2019%3A54%3A40%7C%7C%7Cep%3Dhttps%3A%2F%2Fwww.realtor.com%2Frealestateagents%2F-David-Murrah_Perry_FL_3772799_31099381%7C%7C%7Crf%3D%28none%29; sbjs_first_add=fd%3D2024-10-29%2019%3A54%3A40%7C%7C%7Cep%3Dhttps%3A%2F%2Fwww.realtor.com%2Frealestateagents%2F-David-Murrah_Perry_FL_3772799_31099381%7C%7C%7Crf%3D%28none%29; sbjs_current=typ%3Dtypein%7C%7C%7Csrc%3D%28direct%29%7C%7C%7Cmdm%3D%28none%29%7C%7C%7Ccmp%3D%28none%29%7C%7C%7Ccnt%3D%28none%29%7C%7C%7Ctrm%3D%28none%29; sbjs_first=typ%3Dtypein%7C%7C%7Csrc%3D%28direct%29%7C%7C%7Cmdm%3D%28none%29%7C%7C%7Ccmp%3D%28none%29%7C%7C%7Ccnt%3D%28none%29%7C%7C%7Ctrm%3D%28none%29; _cq_duid=1.1730228081.eC3TivjY17uFRhxZ; _cq_suid=1.1730228081.WoBNGPcoViNZa7Wy; _lr_env_src_ats=false; G_ENABLED_IDPS=google; mdLogger=false; kampyle_userid=0121-d327-b311-2048-ddd5-eda6-8974-dc4b; DECLINED_DATE=1730228477981; kampylePageLoadedTimestamp=1730238384727; g_state={"i_p":1730324843693,"i_l":2}; sbjs_udata=vst%3D3%7C%7C%7Cuip%3D%28none%29%7C%7C%7Cuag%3DMozilla%2F5.0%20%28Windows%20NT%2010.0%3B%20Win64%3B%20x64%29%20AppleWebKit%2F537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome%2F128.0.0.0%20Safari%2F537.36%20OPR%2F114.0.0.0; AMCV_8853394255142B6A0A4C98A4%40AdobeOrg=-1124106680%7CMCIDTS%7C20026%7CMCMID%7C68044453388304758410257228045712879539%7CMCAID%7CNONE%7CMCOPTOUT-1730291728s%7CNONE%7CvVersion%7C5.2.0; kampyleUserSession=1730284530617; kampyleUserSessionsCount=3; kampyleSessionPageCounter=1; AWSALBTG=zhVJZ+b9dYH3H1uJFKAhqonrBKduLg/JPFRrefCfTWy/qq9A25TAusduMyGukiXv0txOEWXFl2tiysxUeFiB0ksS9a/aPqIgCnDzYBi8UbvdCfQecyQOcXsMceRrB22xkx+JHR8ipOJeEGhtzIePNtmmUV4KxRwKU1GwVZm1g7lp; AWSALBTGCORS=zhVJZ+b9dYH3H1uJFKAhqonrBKduLg/JPFRrefCfTWy/qq9A25TAusduMyGukiXv0txOEWXFl2tiysxUeFiB0ksS9a/aPqIgCnDzYBi8UbvdCfQecyQOcXsMceRrB22xkx+JHR8ipOJeEGhtzIePNtmmUV4KxRwKU1GwVZm1g7lp; AWSALB=9vqzUA7U7p7EVptY+e9rD+wFd3qKCGYdzalEd4YZ/btYqqRvxAPv1vsG57kG00zugz6EFrwx7Rf4lObfYzzBvsWJD8BRSVwzUb+R/XQurMcdAEDMr8Liquemothd; AWSALBCORS=9vqzUA7U7p7EVptY+e9rD+wFd3qKCGYdzalEd4YZ/btYqqRvxAPv1vsG57kG00zugz6EFrwx7Rf4lObfYzzBvsWJD8BRSVwzUb+R/XQurMcdAEDMr8Liquemothd' \
-H 'priority: u=1, i' \
-H 'referer: https://www.realtor.com/realestateagents/' \
-H 'sec-ch-ua: "Chromium";v="128", "Not;A=Brand";v="24", "Opera GX";v="114"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "Windows"' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: same-origin' \
-H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 OPR/114.0.0.0' \
-H 'x-newrelic-id: VwEPVF5XGwQHXFNTBAcAUQ=='
But when using Postman or Curl converter websites to mimic this request, the response is always: "You are not authorized to access this request"
Are there extra steps that could be taken to be able to get the response? maybe there are special tokens I need to extract from my browser and add to the request to authenticate it, or maybe frameworks like selenium can somehow when the browser makes the request copies and returns the response to my Python script.
Upvotes: 0
Views: 69