user3660700
user3660700

Reputation: 13

Scrape HTML Table with JSoup

I'm attempting to scrape information from the table on this site (not live at the moment, using a saved .htm): https://web.archive.org/web/20140106024901/http://ftpcontent2.worldnow.com/wjrt/school/closings.htm

Essentially I'm writing a program that returns what schools/businesses are closed based upon the first column of this table. I've tried using JSoup to save the data as an Element but I can't seem to find the table ID in the page's source code as it appeared in this situtation: Using JSoup To Extract HTML Table Contents

<P><TABLE BORDER=0 CELLPADDING=2 CELLSPACING=1><TR><TD CLASS="timestamp" ALIGN=RIGHT>UPDATED SUNDAY, JAN  5 AT  9:45 PM</TD></TR><TR><TD BGCOLOR="#EEEEEE"><FONT CLASS="orgname">AARP Foundation&nbsp;[<a href="/web/20140106024901/http://www.aarpworksearch.org/" target=_new>WEB</A>]</FONT>: <FONT CLASS="status">Closed Tomorrow</FONT></TD></TR><TR><TD BGCOLOR="#DDDDDD"><FONT CLASS="orgname">Akron/Fairgrove&nbsp;[<a href="/web/20140106024901/http://www.a-f.k12.mi.us/" target=_new>WEB</A>]</FONT>: <FONT CLASS="status">Closed Tomorrow</FONT></TD></TR><TR><TD BGCOLOR="#EEEEEE"><FONT CLASS="orgname">Alcona&nbsp;[<a href="/web/20140106024901/http://www.alconaschools.net/" target=_new>WEB</A>]</FONT>: <FONT CLASS="status">Closed Tomorrow</FONT></TD></TR><TR><TD BGCOLOR="#DDDDDD"><FONT CLASS="orgname">Alma&nbsp;[<a href="/web/20140106024901/http://www.almaschools.net/" target=_new>WEB</A>]</FONT>: <FONT CLASS="status">Closed Tomorrow</FONT></TD>...

How do I save the data in this table?

Upvotes: 1

Views: 1343

Answers (1)

Syam S
Syam S

Reputation: 8509

Luckily the table in question is the only one which is colored. You could take advantage of that in this case. The below program prints out what you want. You could modify it to suit your need.

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;


public class JsoupParser3 {

    public static void main(String[] args) {
        Document doc;

        try {               
            doc = Jsoup.connect("https://web.archive.org/web/20140106024901/http://ftpcontent2.worldnow.com/wjrt/school/closings.htm").get();
            for(Element row : doc.select("td[bgcolor]")){
                System.out.println(row.select("font.orgname").first().text() + " - " + row.select("font.status").first().text());
            }

            System.out.println("Done");

        }

        catch (IOException e) {
            e.printStackTrace();
        }
    }

}

Upvotes: 1

Related Questions