Reputation: 337
As a part of a job, I have to sort through this page and gather info on the school administrator, the address, and county of each school. I have done a decent amount of work in java so I figured if I was going to try to make something to do this, it should be in java.
However I haven't done anything like this before, and am a little confused as to where I should start. If someone could help me with what classes I need to use, and a little information on how I would go about looking through the HTML code to sort through all of this, that would be great. Thanks.
Upvotes: 1
Views: 146
Reputation: 65
Selenium it could work quite well for what you want to do. Im using it to develop an application with automated tests, but it would work for you as well.
Upvotes: 0
Reputation: 99
You can use java.util.regex ;regular expression are useful and simple to use
Upvotes: 0
Reputation: 30481
You need to implement a scraper, i.e. an application that scrapers data out of HTML.
I'd start by looking into a decent scraper library, like jsoup (http://jsoup.org/) and see if you can use it to do the job.
In essense you will end up with something like:
Document doc = Jsoup.connect("http://www.ncpublicschools.org/...").get();
Elements schools = doc.select("div.indenter p span.colorText2 a");
Just keep on applying select rules as necessary to collect the data you need.
Upvotes: 4