user1547050
user1547050

Reputation: 337

Gathering data from HTML files

As a part of a job, I have to sort through this page and gather info on the school administrator, the address, and county of each school. I have done a decent amount of work in java so I figured if I was going to try to make something to do this, it should be in java.

However I haven't done anything like this before, and am a little confused as to where I should start. If someone could help me with what classes I need to use, and a little information on how I would go about looking through the HTML code to sort through all of this, that would be great. Thanks.

Upvotes: 1

Views: 146

Answers (3)

Zale
Zale

Reputation: 65

Selenium it could work quite well for what you want to do. Im using it to develop an application with automated tests, but it would work for you as well.

Upvotes: 0

Mikou
Mikou

Reputation: 99

You can use java.util.regex ;regular expression are useful and simple to use

Upvotes: 0

jsalonen
jsalonen

Reputation: 30481

You need to implement a scraper, i.e. an application that scrapers data out of HTML.

I'd start by looking into a decent scraper library, like jsoup (http://jsoup.org/) and see if you can use it to do the job.

In essense you will end up with something like:

Document doc = Jsoup.connect("http://www.ncpublicschools.org/...").get();
Elements schools = doc.select("div.indenter p span.colorText2 a");

Just keep on applying select rules as necessary to collect the data you need.

Upvotes: 4

Related Questions