Tbuermann
Tbuermann

Reputation: 93

How do I parse this HTML with Jsoup

I am trying to extract "Know your tractor" and "Shell Petroleum Company.1955"? Bear in mind that that is just a snippet of the whole code and there are more then one H2/H3 tag. And I would like to get the data from all the H2 and H3 tags.

Heres the HTML: https://i.sstatic.net/Pif3B.png

The Code I have just now is:

    ArrayList<String> arrayList  = new ArrayList<String>();
    Document doc = null;
 try{

     doc = Jsoup.connect("http://primo.abdn.ac.uk:1701/primo_library/libweb/action/search.do?dscnt=0&scp.scps=scope%3A%28ALL%29&frbg=&tab=default_tab&dstmp=1332103973502&srt=rank&ct=search&mode=Basic&dum=true&indx=1&tb=t&vl(freeText0)=tractor&fn=search&vid=ABN_VU1").get();
     Elements heading = doc.select("h2.EXLResultTitle span"); 

     for (Element src : heading) {
            String j = src.text();
            System.out.println(j);  //check whats going into the array 
            arrayList.add(j);
     }

How would I extract "Know your tractor" and "Shell Petroleum Company.1955"? Thanks for your help!

Upvotes: 5

Views: 705

Answers (1)

BalusC
BalusC

Reputation: 1108547

Your selector only selects <span> elements which are inside <h2 class="EXLResultTitle">, while you actually need those <h2> elements themself. So, just remove span from the selector:

Elements headings = doc.select("h2.EXLResultTitle");

for (Element heading : headings) {
    System.out.println(heading.text());
}

You should be able to figure the selector for <h3 class="EXLResultAuthor"> yourself based on the lesson learnt.

See also:

Upvotes: 3

Related Questions