Bhetzie
Bhetzie

Reputation: 2932

Parse anchor tags in java string

I'm creating a web crawler and I just read the html of a page and stored into into a string. I then found all of the anchor tags inside the html and stored them into an ArrayList called anchorTags. I now need to get ride of the "a href=" part of each string in the array list. To do this I wrote the following code; however, for some reason I am getting an outofbounds exception. Please note that I need to do this using loops, arraylists only:

ArrayList<String> parsedLinks = new ArrayList<String>();
    String storeHTML = "";

    for(int i = 0; i < anchorTags.size(); i++) {
        String anchorTag = anchorTags.get(i);
        int hrefIndex = anchorTag.indexOf("a href=");

        if (hrefIndex > -1) {



            int beginQuote = anchorTag.indexOf("\"", hrefIndex);

            int EndQuote = anchorTag.indexOf("\"", beginQuote +1);

            if (EndQuote > beginQuote) {
                storeHTML.substring(beginQuote +1, EndQuote);

            }


        }
    }
    parsedLinks.add(storeHTML);
    System.out.println(parsedLinks);
    return parsedLinks;


}

Upvotes: 0

Views: 943

Answers (1)

John3136
John3136

Reputation: 29266

Shouldn't

storeHTML.substring(beginQuote +1, EndQuote);

be

storeHTML = anchorTag.substring(beginQuote +1, EndQuote); ?

Upvotes: 1

Related Questions