Reputation: 492

Parsing string based on different delimiters

< a href=" http://www.google.com " > Google < /a> < br/> //without the spaces

I'm trying to extract the link http://www.google.com as well as the text Google

Upvotes: 2

Answers (3)

Engine Bai

Reputation: 626

I use the filter API in my web crawler, and it works perfectly.

Here is the API code:

public static String filterHref( String hrefLine )
{
    String link = hrefLine;
    if ( !link.toLowerCase().contains( "href" ) )
        return "";
    String[] hrefSplit = hrefLine.split( "href" ); // split href="..." alt="...">...<...>

    link = hrefSplit[ 1 ].split( "\\s+" )[ 0 ]; // get href attribute and value
    if ( link.contains( ">" ) )
        link = link.substring( 0, link.indexOf( ">" ) );
    link = link.replaceFirst( "=", "" );
    link = link.replace( "\"", "" ).replace( "'", "" ).trim();
    return link;
}

Upvotes: 0

akaya

Reputation: 1140

You can extract it by using a simple regex. Try this.

String s = "<a href=\"http://www.google.com\">Google</a><br/>";
Pattern pattern = Pattern.compile("<a\\s+href=\"([^\"]*)\">([^<]*)</a>");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
    System.out.println(matcher.group(1));
    System.out.println(matcher.group(2));
}

Upvotes: 0

Adarsh

Reputation: 3651

This should do the job.

    String url = "<a href=\"http://www.google.com\">Google</a><br/>";
    String[] separate = url.split("\"");
    String URL = separate[1];
    String text = separate[2].substring(1).split("<")[0];

Upvotes: 1

Parsing string based on different delimiters

Answers (3)

Related Questions