Reputation: 49
I got 403 response code in this program, but I need to get 200 to getting back the search result, what can I do?
String url="http://www.google.com/search?q=";
String charset="UTF-8";
String key="java";
String query = String.format("%s",URLEncoder.encode(key, charset));
URLConnection con = new URL(url+ query).openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
Upvotes: 1
Views: 6629
Reputation: 10094
403 response is clear enough. Google servers tells you the way you're doing things is not a way that is authorized, nor tolerated.
Google prohibits the use of automated queries and using it is at your own risk of being blocked at any time.
If you want to go down this road, you'll have to understand why you are blocked (User-agent, IP adress, Header fingerprinting, etc. There are a lot of means for them to know if you're a bot or not)
Upvotes: 3
Reputation: 5155
As an alternative to JSoup, you can use this package.
Code sample:
Map<String, String> parameter = new HashMap<>();
parameter.put("q", "Coffee");
parameter.put("location", "Portland");
GoogleSearchResults serp = new GoogleSearchResults(parameter);
JsonObject data = serp.getJson();
JsonArray results = (JsonArray) data.get("organic_results");
JsonObject first_result = results.get(0).getAsJsonObject();
System.out.println("first coffee: " + first_result.get("title").getAsString());
Upvotes: 0
Reputation: 31595
Try with JSoup
Document document = Jsoup
.connect("http://www.google.com/search?q=" + query)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0")
.get();
System.out.println(document.html());
For extracting links use selector api.
Dependency:
<dependency>
<!-- jsoup HTML parser library @ http://jsoup.org/ -->
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.7.3</version>
</dependency>
Upvotes: 4
Reputation: 1723
Google is blocking the default UserAgent sent by Java. You can use another one and trick Google. Simply add:
con.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
after creating the con and before starting to read.
Upvotes: 1