Sillicon.Dragons
Sillicon.Dragons

Reputation: 419

Regex in Java for URL filtering

I am using the following segment of code to convert plain text hyperlink into html url hyperlink.

message = message.replaceAll("(?:https?|ftps?|http?)://[\\w/%.\\-?&=]+",
        "<a href='$0' target='_blank'>$0</a>").replaceAll(
        "(www\\.)[\\w/%.\\-?&=]+", "<a href='http://$0' target='_blank'>$0</a>");

But i notice there are certain url combination which will not convert to html hyperlink successfully. Can anyone advise on how to improve the codes to matches for those cases also?

enter image description here

Upvotes: 1

Views: 1717

Answers (3)

Rohaq
Rohaq

Reputation: 2046

Here's an example that should match any URLs:

String input = "http://rs43lt13.rapidshare.com/#!download|46311|44541812469|fairy_tgail_045_sd.mp4";
String re_url="((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s\"]*))";

Pattern url_pattern = Pattern.compile(re_url,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher matches = url_pattern.matcher(input);
if (m.find()) {
  System.out.print("Found URL!" + m.group(1));
}

Don't forget to import java.util.regex.* beforehand.

Upvotes: 0

Nishant
Nishant

Reputation: 55866

I have tried a couple of times. Came up with a tricky pattern that works in all of your cases, it creates valid URLs, except the case with trailing / is not handled that elegantly. Hope someone suggests a quick fix for that.

Here is the code:

    String s="stackoverflow " +
            "http://naishe.blogspot.com " +
            "http://tw.com/#!/someTEXTs  " +
            "http://ts123t1.rapi.com/#!download|13321|1313|fairy_tale.mp4 " +
            "http://www.google.com/ " +
            "https://www.google.com/. " +
            "google.com " +
            "google.com, " +
            "google.com/test " +
            "123.com/test " +
            "ex-ample.com " +
            "http://ex-ample.com/test-url_chars?param1=val1&;par2=val+with%20spaces " +
            "something else";
    Pattern trimmer = Pattern.compile("(?:\\b(?:http|ftp|www\\.)\\S+\\b)|(?:\\b\\S+\\.com\\S*\\b)");
    Matcher m = trimmer.matcher(s);
    StringBuffer out = new StringBuffer();
    int i = 1;
    System.out.println(trimmer.toString());
    while(m.find()){
        System.out.println("|"+m.group()+"|");
    m.appendReplacement(out, "<a href=\""+m.group()+"\">URL"+ i++ +"</a>");
}
m.appendTail(out);
System.out.println(out+"!");

Here is the output

(?:\b(?:http|ftp|www\.)\S+\b)|(?:\b\S+\.com\S*\b)
|http://naishe.blogspot.com|
|http://tw.com/#!/someTEXTs|
|http://ts123t1.rapi.com/#!download|13321|1313|fairy_tale.mp4|
|http://www.google.com|
|https://www.google.com|
|google.com|
|google.com|
|google.com/test|
|123.com/test|
|ex-ample.com|
|http://ex-ample.com/test-url_chars?param1=val1&;par2=val+with%20spaces|

stackoverflow <a href="http://naishe.blogspot.com">URL1</a> 
<a href="http://tw.com/#!/someTEXTs">URL2</a>  
<a href="http://ts123t1.rapi.com/#!download|13321|1313|fairy_tale.mp4">URL3</a>
 <a href="http://www.google.com">URL4</a>/ 
<a href="https://www.google.com">URL5</a>/.
 <a href="google.com">URL6</a> <a href="google.com">URL7</a>,
 <a href="google.com/test">URL8</a> <a href="123.com/test">URL9</a>
 <a href="ex-ample.com">URL10</a>
 <a href="http://ex-ample.com/test-url_chars?param1=val1&;par2=val+with%20spaces">URL11</a> something else!

you see trailing /? :)

A friendly suggestion to OP: when providing with test case please choose a format that we can copy from. One can't copy from a JPEG to a text editor.

Upvotes: 1

C.Champagne
C.Champagne

Reputation: 5489

URLEncoder.encode(String url, String encoding) should help you, no?

Upvotes: 0

Related Questions