Scimonster
Scimonster

Reputation: 33409

Unable to simulate a POST request -- not getting the correct response

I am trying to programmatically submit a form (POST request) on a remote site from a command-line NodeJS script and scrape the return data.

The remote form is here.

When i submit it through the browser, it first goes to the page itself (specified in the <form action>), which returns a 302 status code redirecting to a different page, which prints the data.

However, when i make the POST request programmatically, via NodeJS, i get a 200 Server Busy response. I have also tried equivalent code in PHP, but no dice.

I am passing the headers, cookies, and form data to try and simulate the browser's request, copied from Chrome's network inspector.

This is the request module.

var url = 'http://www.meteo.co.il/StationReportFast.aspx?ST_ID=120';
var request = require('request');
var jar = request.jar();
jar.setCookie(request.cookie("ASP.NET_SessionId=tsytqpkr04g5w2bfsu3fncbx"), url);
jar.setCookie(request.cookie("arp_scroll_position=177"), url);

//console.log(jar)

request.post(
    url, {
         form: {
            '__EVENTTARGET' : '',
            '__EVENTARGUMENT' : '',
            'chkAll' : 'on',
            'lstMonitors' : '%3CWebTree%3E%3CNodes%3E%3ClstMonitors_1%20Checked%3D%22true%22%3E%3C/lstMonitors_1%3E%3ClstMonitors_2%20Checked%3D%22true%22%3E%3C/lstMonitors_2%3E%3ClstMonitors_3%20Checked%3D%22true%22%3E%3C/lstMonitors_3%3E%3ClstMonitors_4%20Checked%3D%22true%22%3E%3C/lstMonitors_4%3E%3ClstMonitors_5%20Checked%3D%22true%22%3E%3C/lstMonitors_5%3E%3ClstMonitors_6%20Checked%3D%22true%22%3E%3C/lstMonitors_6%3E%3ClstMonitors_7%20Checked%3D%22true%22%3E%3C/lstMonitors_7%3E%3ClstMonitors_8%20Checked%3D%22true%22%3E%3C/lstMonitors_8%3E%3ClstMonitors_9%20Checked%3D%22true%22%3E%3C/lstMonitors_9%3E%3ClstMonitors_10%20Checked%3D%22true%22%3E%3C/lstMonitors_10%3E%3ClstMonitors_11%20Checked%3D%22true%22%3E%3C/lstMonitors_11%3E%3ClstMonitors_12%20Checked%3D%22true%22%3E%3C/lstMonitors_12%3E%3ClstMonitors_13%20Checked%3D%22true%22%3E%3C/lstMonitors_13%3E%3ClstMonitors_14%20Checked%3D%22true%22%3E%3C/lstMonitors_14%3E%3C/Nodes%3E%3C/WebTree%3E',
            'RadioButtonList1' : '0',
            'RadioButtonList2' : '0',
            'BasicDatePicker1$TextBox' : '02/02/2015',
            'txtStartTime' : '00:00',
            'txtStartTime_p' : '2015-2-3-0-0-0-0',
            'BasicDatePicker2$TextBox' : '03/02/2015',
            'txtEndTime' : '00:00',
            'txtEndTime_p' : '2015-2-3-0-0-0-0',
            'ddlAvgType' : 'AVG',
            'ddlTimeBase' : '60',
            'btnGenerateReport' : 'הצג דוח',
            'txtErrorMonitor' : 'אנא בחר לפחות מוניטור אחד',
            'txtErrorTimeBase' : 'בחר בסיס זמן',
            'txtError2Y' : 'Select2Monitors'
        },
        jar: jar,
        headers: {
            Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
            'Accept-Encoding': 'gzip, deflate',
            Host: 'www.meteo.co.il',
            Origin: 'http://www.meteo.co.il',
            Referer: 'http://www.meteo.co.il/StationReportFast.aspx?ST_ID=120',
            'Content-Type': 'application/x-www-form-urlencoded'
        }
    }, function (error, response, body) {
        if (!error && response.statusCode == 200) {
            console.log(body)
        } //else {
            console.log(arguments)
        //}
    }
);

I'm pretty sure that the issue is not with Hebrew in the POST data. I created a test server that just printed the headers and POST data, and this code worked fine pointing there.

How can i simulate this request?

Update: I tried a few other URLs from a different domain. http://www.mop-zafon.org.il/csv/cgi-bin/picman.cgi works, while http://www.mop-zafon.net/DynamicTable.aspx?G_ID=0 does not.

Is it possible that it's a problem to make a POST request with a URL querystring as well?

Upvotes: 0

Views: 654

Answers (2)

Scimonster
Scimonster

Reputation: 33409

It turned out that it needed the User-Agent header set. I guess it only wanted to send to a browser, not a script.

I also needed to include the __VIEWSTATE form data as well using the method suggested by Sean Baker.

Finally, followAllRedirects: true needed to be added to the options object to make it follow the redirect.

Upvotes: 1

Sean Baker
Sean Baker

Reputation: 662

Are you sending the VIEWSTATE field back on the request? The site appears to be sending it to you on the initial page request encrypted, and likely it contains CSRF protection. I'd try having the script initiating a genuine page request initially, grabbing all of the hidden elements, and then submitting back to see if you still get the 200 instead of the 302.

Upvotes: 1

Related Questions