Ashish Negi
Ashish Negi

Reputation: 5301

Getting Data from Internet in Java

I thought of making the following application for my college project in java. I know core java. I want to know what should i read "specifically" for this project as there is less time:

It will have an interface to put your query. This string would go as a query to internet search engines and with the help of search engine find the data (the first web page that we see (that is data for my application for this time. :) )).
I do not want to display the data. I just want the HTML file or the source code of the generated web page. Is it sounding like Common Getaway Interface? I do not know about this.

But i think it for the same purpose. If it is this. please guide me to know how to implement this.
Whatever please specify

for eg. as on google we search something it shows us the links of the websites. I can see the source code of this generated web page. I just want this page for my application to work on.

EDIT: I do not want to rely on google only or any particular web server. I want to decide that by my application.
Please also refer to my problem 2.

As i discovered that we have Terms of Conditions for websites should i try to make my crawler. Would then my application not breaking the rules . Well its important for me.

Upvotes: 3

Views: 8199

Answers (5)

amal
amal

Reputation: 1369

URL url = new URL("http://fooooo.com");
in = new BufferedReader(new InputStreamReader(url.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
  {
    System.out.println(inputLine);
  }

Should be enough to get you started .

And yes , do check if you are not violating the usage terms of a website . Search Engines dont really like you trying to access them via a program .

Many , Including Google , has APIs specifically designed for this purpose.

Upvotes: 4

Rajeev Sreedharan
Rajeev Sreedharan

Reputation: 1753

I do not want to display the data. I just want the HTML file or the source code of the generated web page.

You probably dont need the HTML either. Google provide its search results as a web service using this API. Similarly for other search engine GIYF. You get the search results as XML, which is far more easier for you to parse. Plus the XML wont have any unwanted data like ads.

Upvotes: 1

Santosh
Santosh

Reputation: 17893

Ashish, Here what I would recommend.

  1. Learn the basics of JSON from these links (Introduction ,lib download)
  2. Then look at the Google Web Search JSON API here.
  3. Learn how to GET the data from servers using HttpClient library here.
  4. Now what you have to do is, fire a get request for the search, read the JSON response, parse the response using the JSON lib from #1 and you have the search results.
  5. Most of the search engines (Bing etc) offer Jason/REST apis so you can do the same for other search engines.

Note: Jason APIs are normally used from JavaScritps on the UI side but since its very easy and quick to learn, I suggested you that. You can also explore (if time permits) the XML based APIs also.

Upvotes: 5

PeterMmm
PeterMmm

Reputation: 24630

Read "Working with URL's" in the Java tutorial to get an idea what is behind the available libs like HTMLUnit, HttpClient, etc

Upvotes: 1

Mobile Developer
Mobile Developer

Reputation: 5760

you can do everything you want using HTMLUnit. It´s like a web browser but for java. Check some examples at their website.

Upvotes: 2

Related Questions