nullByteMe
nullByteMe

Reputation: 6391

How to use regex with pattern matcher against multiple strings?

I'm reading in a list of strings from a List<String>. The strings look like this:

blah1
blah2
blah3
blah4

In java, I'd like to build a regex that checks for a pattern like this (myString/|yourString) and concatenate that to each of the strings in the list above while doing a pattern match against the lines of a file.

So I do this (the code below is just snippits):

String pattern = "(myString/|yourString.)"
private String listAsString;  

private void createListAsStrings() {
   StringBuilder sb = new StringBuilder();

   for(String string : stringList) {
      sb.append(string + "|");  # using the pipe hoping it will do an OR in the regex
   }

   listAsString = sb.toString();
}

To build the pattern, I'm trying to do the following:

Pattern p = Pattern.compile(pattern + listAsString);

But when I get to running the matcher it doesn't go through each string in the list of strings from my stringbuilder. And then the last problem is that my last string will contain a |.

Is there a way to match myString/blah1 or yourString.blah1 or myString/blah2 etc.. using a regex against each line in a file?

There is a lot of code, so I just posted what seemed relevant.

Upvotes: 1

Views: 343

Answers (2)

ajb
ajb

Reputation: 31689

I think the basic problem is that your pattern (ignoring the trailing | problem) is something like

(myString/|yourString.)blah1|blah2|blah3 

which will match one of these

myString/blah1
yourString.blah1
blah2
blah3

That's how the operator precedence works in regexes. You need an extra set of parentheses around the lines from the file (plus see the other answers about \Q..\E and avoiding the bar at the end of the string).

Upvotes: 0

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726479

The expression that you are looking to build should be as follows:

myString/(?:\Qblah1\E|\Qblah2\E)

You need to wrap the strings blah1, blah2, etc. in \Q - \E in case the strings contain regex metacharacters. To fix the addition of leading | use a boolean variable that indicates if this is the first iteration through the loop or not:

StringBuilder sb = new StringBuilder();
boolean isFirst = true;
for(String word : stringList) {
    if (!isFirst) {
        sb.append('|');
    } else {
        isFirst = false;
    }
    sb.append("\\Q");
    sb.append(word);
    sb.append("\\E");
}
String regex = "myString/" + "(?:" + sb + ")";

Upvotes: 2

Related Questions