Reputation: 158
I have been trying to fetch some data from nytimes.com but unfortunatelly all the interesting part is missing. I'm trying to get the search results for given input query. When i do it by postman or file_get_contents in php the result is the same - I don't get the resultSearch section. I 've read that i may need some cookies or authorization but nothing helped. Any ideas? PS I tried many variants - with or without some options like for ex: followRedirects
String serachPhraze = "africa flood death";
try
{
Connection.Response doc = Jsoup.connect("http://query.nytimes.com/search/sitesearch/#/" + serachPhraze.replaceAll("\\s+","+"))
.userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
.referrer("http://www.google.com")
.timeout(12000)
.followRedirects(true)
.execute();
Upvotes: 0
Views: 297
Reputation: 17745
The results is fetched via ajax and generated dynamically via javascript. Jsoup can't handle that. The content that you will get from a mere parser is the one you see when you press Ctrl+U (in chrome). That's the html that the server generates and that's the only content that you will get, either by jsoup or file_get_contents (in php). If you want the javascript content then you have to use something like selenium that includes a javascript engine. Selenium will run the javascript code and then grab the content.
Upvotes: 1