Rilcon42
Rilcon42

Reputation: 9763

axios get the redirect URL of page returned by post

I am trying to get the URL of a webpage after a redirect from Axios, but nothing in the Axios config seems to give me the expected URL after the redirect.

I think I'm passing in the correct search box parameter, to get my output URL. but I can't tell for certain because I can't see the response URL.

My goal- to get the URL: https://www.redfin.com/GA/Lawrenceville/2105-Bentbrooke-Trl-30043/home/24906463

My code:

var querystring = require("querystring");
let axios = require("axios");

async function run() {
  let rf = await axios.post(
    "https://www.redfin.com",
    querystring.stringify({
      searchInputBox: "2105 bentbrooke trl",
    }),
    {
      headers: {
        "Content-Type": "application/x-www-form-urlencoded",
      },
    }
  );
  console.log(`RF URL: `, rf.config, rf.request.res.responseUrl);
}
run();

My current output:

RF URL:  {
  url: 'https://www.redfin.com',
  method: 'post',
  data: 'searchInputBox=2105%20bentbrooke%20trl',       
  headers: {
    Accept: 'application/json, text/plain, */*',        
    'Content-Type': 'application/x-www-form-urlencoded',
    'User-Agent': 'axios/0.21.1',
    'Content-Length': 38
  },
  transformRequest: [ [Function: transformRequest] ],   
  transformResponse: [ [Function: transformResponse] ], 
  timeout: 0,
  adapter: [Function: httpAdapter],
  xsrfCookieName: 'XSRF-TOKEN',
  xsrfHeaderName: 'X-XSRF-TOKEN',
  maxContentLength: -1,
  maxBodyLength: -1,
  validateStatus: [Function: validateStatus]
} https://www.redfin.com/

Similar question, but none of the suggested answers worked for me here (I think Axios may have updated, or there is some difference due to the async call I am using): how to get the landing page URL after redirections using axios

Upvotes: 0

Views: 8576

Answers (1)

Petr Hejda
Petr Hejda

Reputation: 43491

Your snippet is working as expected. It's sending a POST request to the URL with a form header, and the post data in a correct form according to the form header.

curl --request POST 'https://www.redfin.com/' --header 'Content-Type: application/x-www-form-urlencoded' --data-urlencode 'searchInputBox=2105 bentbrooke trl'

But each website handles its forms in a different way. And this is not how this website handles its search form.


A little bit of reverse engineering when I submitted the form on the website, explored the requests in the devtools Network tab of my browser, and cleaned them up in Postman afterwards (to find and remove unrequired params).

This website requires a GET request with query params location=<your query> and v=2 (probably a version of the search engine). It also requires a User-Agent header to be set, otherwise it returns a captcha page instead of the expected result.

This request returns a set of two JSON objects (the first one is empty). The first one is ignored in this case and only the second is parsed by their JS code, which performs a redirect to the exact match if available. Otherwise it shows a list of results.

CURL request:

curl 'https://www.redfin.com/stingray/do/query-location?location=2105%20bentbrooke%20trl&v=2' --header 'User-Agent: bot'

Returned data:

{}&&{
    "version": 380,
    "errorMessage": "Success",
    "resultCode": 0,
    "payload": {
        "sections": [
            {
                "rows": [
                    {
                        "id": "1_24906463",
                        "type": "1",
                        "name": "2105 Bentbrooke Trl",
                        "subName": "Lawrenceville, GA, USA",
                        "url": "/GA/Lawrenceville/2105-Bentbrooke-Trl-30043/home/24906463",
                        "active": true,
                        "claimedHome": false,
                        "invalidMRS": false,
                        "businessMarketIds": [
                            14
                        ],
                        "countryCode": "US",
                        "searchStatusId": 2
                    }
                ],
                "name": "Addresses"
            }
        ],
        "exactMatch": {
            "id": "1_24906463",
            "type": "1",
            "name": "2105 Bentbrooke Trl",
            "subName": "Lawrenceville, GA, USA",
            "url": "/GA/Lawrenceville/2105-Bentbrooke-Trl-30043/home/24906463",
            "active": true,
            "claimedHome": false,
            "invalidMRS": false,
            "businessMarketIds": [
                14
            ],
            "countryCode": "US",
            "searchStatusId": 2
        },
        "extraResults": {},
        "responseTime": 0,
        "hasFakeResults": false,
        "isGeocoded": false,
        "isRedfinServiced": false
    }
}

So when you convert the CURL request and the response parsing and looking for the exact match to Node.JS Axios, you get this code:

const axios = require('axios');

const BASE_URL = 'https://www.redfin.com';

const getRedirectUrl = async (query) => {
    const response = await axios({
        'method': 'GET',
        'url': getSearchUrl(query),
        'headers': {
            'User-Agent': 'I\'m just a bot that passes the user-agent header',
        }
    });

    const result = getSecondObjectFromBody(response.data);

    if (result.payload.exactMatch === undefined) {
        return null;
    }

    return BASE_URL + result.payload.exactMatch.url;
}

const getSearchUrl = (query) => {
    return BASE_URL
        + '/stingray/do/query-location?location='
        + encodeURIComponent(query)
        + '&v=2';
}

const getSecondObjectFromBody = (body) => {
    return JSON.parse(body.slice(body.indexOf('&&') + 2));
}

const run = async () => {
    console.log(await getRedirectUrl('2105 bentbrooke trl')); // has exact match
    console.log(await getRedirectUrl('fdsfds')); // no exact match
}

run();

Which outputs:

https://www.redfin.com/GA/Lawrenceville/2105-Bentbrooke-Trl-30043/home/24906463
null

Upvotes: 1

Related Questions