kee
kee

Reputation: 11619

Java String.matches regex

I am trying to see if a given host name appears in a list of hosts in the form of comma separated string like the following:

String list = "aa.com,bb.com,cc.com,dd.net,ee.com,ff.net";
String host1 = "aa.com"; // should be a match
String host2 = "a.com";  // shouldn't be a match
String host3 = "ff.net"  // should be a match

// here is a test for host1     
if (list.matches(".*[,^]" + host1 + "[$,].*")) {
    System.out.println(host1 + " matched");
}
else {
    System.out.println(host1 + " not matched");
}

But I got not matched for host (aa.com) but then I am not very familiar with regex. Please correct me!

BTW I don't want to use a solution where you split the host list into an array and then doing matching there. It was too slow because the host list can be quite long. Regex apporoach can be even worse but I was trying to make it work first.

Upvotes: 1

Views: 871

Answers (5)

Nicko
Nicko

Reputation: 61

I also think Regexes are too slow if you are looking for an exact match, so I tried to write a method that looks for occurences of the host name in the list and checks every substring whether it's not a part of a wider host name (like "a.com" is a part of "aa.com"). If it's not - the result is true, there is such a host in the list. Here's the code:

boolean containsHost(String list, String host) {
    boolean result = false;
    int i = -1;
    while((i = list.indexOf(host, i + 1)) >= 0) { // while there is next match
        if ((i == 0 || list.charAt(i - 1) == ',') // beginning of the list or has a comma right before it
                && (i == (list.length() - host.length()) // end of the list 
                || list.charAt(i + host.length()) == ',')) { // or has a comma right after it
            result = true;
            break;
        }
    }
    return result;
}

But then I thought that it would be even faster to check just 3 cases - matches in the beginning, in the middle and in the end of the list, which can be done with startsWith, contains and endsWith methods respectively. Here's the second option, which I would prefer in your case:

boolean containsHostShort(String list, String host) {
    return list.contains("," + host + ",") || list.startsWith(host + ",") || list.endsWith("," + host);     
}

UPD: ZouZou's comment to your post also seems good, I would recommend to compare the speed on a list similar to the sizes you have in the real situation and choose the fastest one.

Upvotes: 1

KPERI
KPERI

Reputation: 233

Try this:

String list = "aa.com,bb.com,cc.com,dd.net,ee.com,ff.net";
String host1 = "aa.com"; // should be a match
String host2 = "a.com";  // shouldn't be a match
String host3 = "ff.net"  // should be a match 

//For host1
Pattern p1 = Pattern.compile("\\b[A-Za-z]{2}.com");  
Matcher m1 = p1.matcher(list);

if(m1.find()){
   System.out.println(host1 + " matched");
}else{
   System.out.println(host1 + " not matched");
}

//for host2
p1 = Pattern.compile("\\b[A-Za-z]{1}.com");
m1 = p1.matcher(list);

if(m1.find()){
     System.out.println(host2 + " matched");
}else{
     System.out.println(host2+"Not mached");
}

//and so on...

The \b means word boundary (so start of word in this case). The [A-Za-z]{n}.com means a character between A-Z or a-z n times followed by a .com

Upvotes: 0

user3662273
user3662273

Reputation: 554

This works prefectly,without regex

         String list = "aa.com,bb.com,cc.com,dd.net,ee.com,ff.net";
         String host1 = "aa.com"; 
         String host2 = "a.com";  
         String host3 = "ff.net"; 
         boolean checkingFlag=false;
         String [] arrayList=list.split(",");
        System.out.println(arrayList.length);




        for(int i=0;i<arrayList.length;i++)
        {
          // here is a test for host1     
            if (arrayList[i].equalsIgnoreCase(host1))
                checkingFlag=true;

        }

        if (checkingFlag)
            System.out.println("Matched");
        else
            System.out.println("Not matched");

It is hardly taken 20-30 millsecs to execute a loop with 1 million records.As per your comment i have just edited.you can check this.

long startingTime=System.currentTimeMillis();

        for(int i=0;i<1000000;i++)
        {
            if (i==999999)
                checkingFlag=true;

        }
        long endingTime=System.currentTimeMillis();
        System.out.println("total time in millisecond:"+ (endingTime-startingTime));

Upvotes: 0

mrres1
mrres1

Reputation: 1155

You can use a lambda to stream the array and return a boolean for the match.

String list = "aa.com,bb.com,cc.com,dd.net,ee.com,ff.net";
String host1 = "aa.com"; // should be a match
String host2 = "a.com";  // shouldn't be a match
String host3 = "ff.net";  // should be a match

ArrayList<String> alist = new ArrayList<String>();

for(String item : list.split("\\,"))
{
    alist.add(item);
}

boolean contains_host1 = alist.stream().anyMatch(b -> b.equals(host1));
boolean contains_host2 = alist.stream().anyMatch(b -> b.equals(host2));
boolean contains_host3 = alist.stream().anyMatch(b -> b.equals(host3));

System.out.println(contains_host1);
System.out.println(contains_host2);
System.out.println(contains_host3);

Console output:

true
false
true

Upvotes: 0

Farhad Alizadeh Noori
Farhad Alizadeh Noori

Reputation: 2306

Like it is mentioned in the comments. You shouldn't be using Matches as it tries to match the regex pattern to the entire comma delimited string. You are not trying to do that. You are trying to detect if a given substring occurs in the comma separated source string.

In order to do that you would just use the hostname in a findall method. However, you can just use substring which would not have an overhead of regex compilation.

Regexes are used to match strings that could have variations in the pattern matched. Never use a regex when you want to do exact string matching.

Upvotes: 0

Related Questions