nzxt90
nzxt90

Reputation: 47

Jsoup - Working with URLS

I have the code below:

public static void main (String args[]) throws IOException
{
    String absHref = "";
    String urlList = "";
    String relHref = "";

    Document doc = Jsoup.connect("https://www.planittesting.com").get();
    Elements links = doc.select("a[href]"); 
    for (Element link : links) 
    {
        absHref = link.attr("abs:href");
        urlList = absHref.toString();
        System.out.println(urlList);

But the results have gaps in them, am I missing something? I'm turning the relative urls into absolute urls but some of them come back as blanks.

[enter image description here]

Upvotes: 2

Views: 124

Answers (2)

Stephan
Stephan

Reputation: 43023

You can fine tune the original CSS selector:

a[href]:not([href~=(?i)^(javascript|tel|mailto)])

Description

a[href]                               /* Select any anchor with an href attribute ... */
:not(                                 /* not starting... */
 [href~=(?i)^(javascript|tel|mailto)] /* with javascript, tel or mail */
)

Demo

Upvotes: 0

Davide Pastore
Davide Pastore

Reputation: 8738

If you use link.attr("href"); you can see that these href attributes are not empty but they contain something else, like:

javascript:__doPostBack('p$lt$ctl00$GeoLocator$rptCultures$ctl01$lbChangeSite','')
javascript:__doPostBack('p$lt$ctl00$GeoLocator$rptCultures$ctl02$lbChangeSite','')
javascript:__doPostBack('p$lt$ctl00$GeoLocator$rptCultures$ctl03$lbChangeSite','')
javascript:__doPostBack('p$lt$ctl00$GeoLocator$rptCultures$ctl04$lbChangeSite','')

If you use link.attr("abs:href"); you see blank values for all the things that are not urls like javascript.

You can fix it adding a simple check:

package com.github.davidepastore.stackoverflow35544869;

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

/**
 * Stackoverflow 35544869 question.
 *
 */
public class App 
{
    public static void main( String[] args ) throws IOException
    {
        String absHref = "";
        String urlList = "";
        String relHref = "";

        Document doc = Jsoup.connect("https://www.planittesting.com").get();
        Elements links = doc.select("a[href]"); 
        for (Element link : links) 
        {
            absHref = link.attr("abs:href");
            if(!absHref.isEmpty()){
                urlList = absHref.toString();
                System.out.println(urlList);
            }
        }
    }
}

Output:

https://www.planittesting.com/uk/Home#main
https://www.planittesting.com/uk/Home
https://www.planittesting.com/uk/Home
https://www.linkedin.com/company/planit-software-testing
https://www.planittesting.com/uk/Course-Bookings
https://www.planittesting.com/uk/Contact
https://www.planittesting.com/
https://www.planittesting.com/uk/Services
https://www.planittesting.com/Services/Functional-Testing
https://www.planittesting.com/Services/Test-Automation
https://www.planittesting.com/Services/Performance-Testing
https://www.planittesting.com/Services/Accessibility-Testing
https://www.planittesting.com/Services/Security-Testing
https://www.planittesting.com/Services/Mobile-App-Testing
https://www.planittesting.com/Services/Digital-Testing
https://www.planittesting.com/Services/Agile-Testing
https://www.planittesting.com/Services/Non-Agile-Testing
https://www.planittesting.com/Services/Test-Strategy
https://www.planittesting.com/Services/Test-Management
https://www.planittesting.com/Services/Process-Improvement
https://www.planittesting.com/Services/DevOps-Solutions
https://www.planittesting.com/Services/Service-Virtualisation
https://www.planittesting.com/Services/Application-Monitoring-Solutions
https://www.planittesting.com/Services/Test-Management-as-a-Service
https://www.planittesting.com/Services/Performance-Testing-Solutions
https://www.planittesting.com/Services/Tools-Licensing
https://www.planittesting.com/Services/On-site-Testing
https://www.planittesting.com/Services/Off-site-Testing
https://www.planittesting.com/Services/Off-shore-Testing
https://www.planittesting.com/uk/Training
https://www.planittesting.com/Training/Software-Testing
https://www.planittesting.com/Training/ISTQB-Foundation-Certificate
https://www.planittesting.com/Training/ISTQB-Advanced-Test-Analyst
https://www.planittesting.com/Training/ISTQB-Advanced-Test-Manager
https://www.planittesting.com/Training/Software-Testing
https://www.planittesting.com/Training/Agile
https://www.planittesting.com/Training/ISTQB-Foundation-Agile-Tester-Extension
https://www.planittesting.com/Training/Certified-Agile-Essentials
https://www.planittesting.com/Training/Certified-Agile-Business-Analysis
https://www.planittesting.com/Training/Certified-Agile-Tester
https://www.planittesting.com/Training/Business-Analysis
https://www.planittesting.com/Training/BCS-Business-Analysis-Foundation
https://www.planittesting.com/Training/BCS-Requirements-Engineering-Certificate
https://www.planittesting.com/Training/BCS-Modelling-Business-Processes
https://www.planittesting.com/Training/BCS-Business-Analysis-Practice
https://www.planittesting.com/Training/Classroom
https://www.planittesting.com/Training/Virtual-Learning
https://www.planittesting.com/Training/Schedule
https://www.planittesting.com/uk/Insights
https://www.planittesting.com/uk/About
https://www.planittesting.com/uk/Join-Our-Team
https://www.planittesting.com/uk/Contact
https://www.planittesting.com/Services
https://www.planittesting.com/Services/Mobile-App-Testing
https://www.planittesting.com/Planit-Testing-Index
https://www.planittesting.com/Training/ISTQB-Foundation-Agile-Tester-Extension
https://www.planittesting.com/Services/Service-Virtualisation
https://www.planittesting.com/Services/Functional-Testing
https://www.planittesting.com/Services/Test-Automation
https://www.planittesting.com/Services/Performance-Testing
https://www.planittesting.com/Services/Accessibility-Testing
https://www.planittesting.com/Services/Security-Testing
https://www.planittesting.com/Services/Mobile-App-Testing
https://www.planittesting.com/Services/Digital-Testing
https://www.planittesting.com/Services/Agile-Testing
https://www.planittesting.com/Services/Non-Agile-Testing
https://www.planittesting.com/Services/Test-Strategy
https://www.planittesting.com/Services/Test-Management
https://www.planittesting.com/Services/Process-Improvement
https://www.planittesting.com/Services/DevOps-Solutions
https://www.planittesting.com/Services/Application-Monitoring-Solutions
https://www.planittesting.com/Services/Performance-Testing-Solutions
https://www.planittesting.com/Services/Test-Management-as-a-Service
https://www.planittesting.com/Services/Service-Virtualisation
https://www.planittesting.com/Services/Tools-Licensing
https://www.planittesting.com/Services
https://www.planittesting.com/Training/Software-Testing
https://www.planittesting.com/Training/Agile
https://www.planittesting.com/Training/Business-Analysis
https://www.planittesting.com/Training
https://www.planittesting.com/Insights/Cricket-Australia-Case-Study
https://www.planittesting.com/Insights/Lend-Lease-Case-Study
https://www.planittesting.com/Insights/Panviva-Case-Study
https://www.planittesting.com/Contact
https://www.planittesting.com/
https://www.linkedin.com/company/planit-software-testing
https://www.linkedin.com/grp/home?gid=4561841
mailto:[email protected]
https://www.planittesting.com/uk/Services
https://www.planittesting.com/uk/Services/Functional-Testing
https://www.planittesting.com/uk/Services/Test-Automation
https://www.planittesting.com/uk/Services/Performance-Testing
https://www.planittesting.com/uk/Services/Accessibility-Testing
https://www.planittesting.com/uk/Tools
https://www.planittesting.com/uk/Tools/Service-Virtualisation
https://www.planittesting.com/uk/Tools/Application-Monitoring
https://www.planittesting.com/uk/Tools/Performance-Testing-Solutions
https://www.planittesting.com/uk/Tools/Test-Management-as-a-Service
https://www.planittesting.com/uk/Training
https://www.planittesting.com/uk/Training/Software-Testing
https://www.planittesting.com/uk/Training/Business-Analysis
https://www.planittesting.com/uk/Training/Agile
https://www.planittesting.com/uk/Training/Full-Course-Schedule
https://www.planittesting.com/uk/About
https://www.planittesting.com/uk/About/Planit-Testing-Index
https://www.planittesting.com/uk/About/Jobs-Board
https://www.planittesting.com/uk/About/Careers-at-Planit
https://www.planittesting.com/uk/About/Bootcamp
https://www.planittesting.com/uk/Contact
https://www.planittesting.com/uk/Contact/Office-1
https://www.planittesting.com/uk/Contact/Office-2
https://www.planittesting.com/uk/Contact/Office-3
https://www.planittesting.com/uk/Contact/Office-4
https://www.planittesting.com/uk/Footer-Navigation/Privacy
https://www.planittesting.com/uk/Footer-Navigation/Terms-Conditions

Upvotes: 1

Related Questions