Reputation: 33409
I am trying to programmatically submit a form (POST request) on a remote site from a command-line NodeJS script and scrape the return data.
The remote form is here.
When i submit it through the browser, it first goes to the page itself (specified in the <form action>
), which returns a 302 status code redirecting to a different page, which prints the data.
However, when i make the POST request programmatically, via NodeJS, i get a 200 Server Busy
response. I have also tried equivalent code in PHP, but no dice.
I am passing the headers, cookies, and form data to try and simulate the browser's request, copied from Chrome's network inspector.
var url = 'http://www.meteo.co.il/StationReportFast.aspx?ST_ID=120';
var request = require('request');
var jar = request.jar();
jar.setCookie(request.cookie("ASP.NET_SessionId=tsytqpkr04g5w2bfsu3fncbx"), url);
jar.setCookie(request.cookie("arp_scroll_position=177"), url);
//console.log(jar)
request.post(
url, {
form: {
'__EVENTTARGET' : '',
'__EVENTARGUMENT' : '',
'chkAll' : 'on',
'lstMonitors' : '%3CWebTree%3E%3CNodes%3E%3ClstMonitors_1%20Checked%3D%22true%22%3E%3C/lstMonitors_1%3E%3ClstMonitors_2%20Checked%3D%22true%22%3E%3C/lstMonitors_2%3E%3ClstMonitors_3%20Checked%3D%22true%22%3E%3C/lstMonitors_3%3E%3ClstMonitors_4%20Checked%3D%22true%22%3E%3C/lstMonitors_4%3E%3ClstMonitors_5%20Checked%3D%22true%22%3E%3C/lstMonitors_5%3E%3ClstMonitors_6%20Checked%3D%22true%22%3E%3C/lstMonitors_6%3E%3ClstMonitors_7%20Checked%3D%22true%22%3E%3C/lstMonitors_7%3E%3ClstMonitors_8%20Checked%3D%22true%22%3E%3C/lstMonitors_8%3E%3ClstMonitors_9%20Checked%3D%22true%22%3E%3C/lstMonitors_9%3E%3ClstMonitors_10%20Checked%3D%22true%22%3E%3C/lstMonitors_10%3E%3ClstMonitors_11%20Checked%3D%22true%22%3E%3C/lstMonitors_11%3E%3ClstMonitors_12%20Checked%3D%22true%22%3E%3C/lstMonitors_12%3E%3ClstMonitors_13%20Checked%3D%22true%22%3E%3C/lstMonitors_13%3E%3ClstMonitors_14%20Checked%3D%22true%22%3E%3C/lstMonitors_14%3E%3C/Nodes%3E%3C/WebTree%3E',
'RadioButtonList1' : '0',
'RadioButtonList2' : '0',
'BasicDatePicker1$TextBox' : '02/02/2015',
'txtStartTime' : '00:00',
'txtStartTime_p' : '2015-2-3-0-0-0-0',
'BasicDatePicker2$TextBox' : '03/02/2015',
'txtEndTime' : '00:00',
'txtEndTime_p' : '2015-2-3-0-0-0-0',
'ddlAvgType' : 'AVG',
'ddlTimeBase' : '60',
'btnGenerateReport' : 'הצג דוח',
'txtErrorMonitor' : 'אנא בחר לפחות מוניטור אחד',
'txtErrorTimeBase' : 'בחר בסיס זמן',
'txtError2Y' : 'Select2Monitors'
},
jar: jar,
headers: {
Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
Host: 'www.meteo.co.il',
Origin: 'http://www.meteo.co.il',
Referer: 'http://www.meteo.co.il/StationReportFast.aspx?ST_ID=120',
'Content-Type': 'application/x-www-form-urlencoded'
}
}, function (error, response, body) {
if (!error && response.statusCode == 200) {
console.log(body)
} //else {
console.log(arguments)
//}
}
);
I'm pretty sure that the issue is not with Hebrew in the POST data. I created a test server that just printed the headers and POST data, and this code worked fine pointing there.
How can i simulate this request?
Update: I tried a few other URLs from a different domain. http://www.mop-zafon.org.il/csv/cgi-bin/picman.cgi works, while http://www.mop-zafon.net/DynamicTable.aspx?G_ID=0 does not.
Is it possible that it's a problem to make a POST request with a URL querystring as well?
Upvotes: 0
Views: 654
Reputation: 33409
It turned out that it needed the User-Agent
header set. I guess it only wanted to send to a browser, not a script.
I also needed to include the __VIEWSTATE
form data as well using the method suggested by Sean Baker.
Finally, followAllRedirects: true
needed to be added to the options object to make it follow the redirect.
Upvotes: 1
Reputation: 662
Are you sending the VIEWSTATE field back on the request? The site appears to be sending it to you on the initial page request encrypted, and likely it contains CSRF protection. I'd try having the script initiating a genuine page request initially, grabbing all of the hidden elements, and then submitting back to see if you still get the 200 instead of the 302.
Upvotes: 1