Atalia.d
Atalia.d

Reputation: 131

How to use Jsoup to get href link without the extra characters?

I have an Element list of which i'm using jsoup's method attr() to get the href attribute. Here is part of my code:

    String searchTerm = "tutorial+programming+"+i_SearchPhrase;
    int num = 10;
    String searchURL = GOOGLE_SEARCH_URL + "?q="+searchTerm+"&num="+num;
    Document doc = Jsoup.connect(searchURL).userAgent("chrome/5.0").get();   
    Elements results = doc.select("h3.r > a");
    String linkHref;

    for (Element result : results) {
        linkHref = result.attr("href").replace("/url?q=","");
        //some more unrelated code...
        }

So for example, when i use the search prase "test", the attr("href") produces (first in the list):

linkHref = https://www.tutorialspoint.com/software_testing/&sa=U&ved=0ahUKEwi_lI-T69jTAhXIbxQKHU1kBlAQFggTMAA&usg=AFQjCNHr6EzeYegPDdpHJndLJ-889Sj3EQ

where i only want: https://www.tutorialspoint.com/software_testing/

What is the best way to fix this? Do i just add some string operations on linkHref (which i know how) or is there a way to make the href attribute contain the shorter link to begin with? Thank you in advanced

Upvotes: 1

Views: 305

Answers (1)

Murat Karagöz
Murat Karagöz

Reputation: 37584

If you always want to remove the query parameters you can make use of String.indexOf() e.g.

int lastPos;
if(linkHref.indexOf("?") > 0) {
   lastPos = linkHref.indexOf("?");
} else if (linkHref.indexOf("&") > 0){
   lastPos = linkHref.indexOf("&");
}
else lastPos = -1;

if(lastPos != -1)
linkHref = linkHref.subsring(0, lastPos);

Upvotes: 3

Related Questions