Reputation: 49299
I need to create a regex that can match multiple strings. For example, I want to find all the instances of "good" or "great". I found some examples, but what I came up with doesn't seem to work:
\b(good|great)\w*\b
Can anyone point me in the right direction?
Edit: I should note that I don't want to just match whole words. For example, I may want to match "ood" or "reat" as well (parts of the words).
Edit 2: Here is some sample text: "This is a really great story." I might want to match "this" or "really", or I might want to match "eall" or "reat".
Upvotes: 19
Views: 105635
Reputation: 67
Just check for the boolean that Regex.IsMatch()
returns.
if (Regex.IsMatch(line, "condition") && Regex.IsMatch(line, "conditition2"))
The line will have both regex, right.
Upvotes: 1
Reputation: 13421
If you can guarantee that there are no reserved regex characters in your word list (or if you escape them), you could just use this code to make a big word list
into @"(a|big|word|list)"
. There's nothing wrong with the |
operator as you're using it, as long as those ()
surround it. It sounds like the \w*
and the \b
patterns are what are interfering with your matches.
String[] pattern_list = whatever;
String regex = String.Format("({0})", String.Join("|", pattern_list));
Upvotes: 25
Reputation: 3443
I think you are asking for smth you dont really mean if you want to search for any Part of the word, you litterally searching letters
e.g. Search {Jack, Jim} in "John and Shelly are cool"
is searching all letters in the names {J,a,c,k,i,m}
*J*ohn *a*nd Shelly *a*re
and for that you don't need REG-EX :)
in my opinion, A Suffix Tree can help you with that
http://en.wikipedia.org/wiki/Suffix_tree#Functionality
enjoy.
Upvotes: 2
Reputation: 3989
I'm not entirely sure that regex alone offers a solution for what you're trying to do. You could, however, use the following code to create a regex expression for a given word. Although, the resulting regex pattern has the potential to become very long and slow:
function wordPermutations( $word, $minLength = 2 )
{
$perms = array( );
for ($start = 0; $start < strlen( $word ); $start++)
{
for ($end = strlen( $word ); $end > $start; $end--)
{
$perm = substr( $word, $start, ($end - $start));
if (strlen( $perm ) >= $minLength)
{
$perms[] = $perm;
}
}
}
return $perms;
}
Test Code:
$perms = wordPermutations( 'great', 3 ); // get all permutations of "great" that are 3 or more chars in length
var_dump( $perms );
echo ( '/\b('.implode( '|', $perms ).')\b/' );
Example Output:
array
0 => string 'great' (length=5)
1 => string 'grea' (length=4)
2 => string 'gre' (length=3)
3 => string 'reat' (length=4)
4 => string 'rea' (length=3)
5 => string 'eat' (length=3)
/\b(great|grea|gre|reat|rea|eat)\b/
Upvotes: 1
Reputation:
I don't understand the problem correctly:
If you want to match "great" or "reat" you can express this by a pattern like:
"g?reat"
This simply says that the "reat"-part must exist and the "g" is optional.
This would match "reat" and "great" but not "eat", because the first "r" in "reat" is required.
If you have the too words "great" and "good" and you want to match them both with an optional "g" you can write this like this:
(g?reat|g?ood)
And if you want to include a word-boundary like:
\b(g?reat|g?ood)
You should be aware that this would not match anything like "breat" because you have the "reat" but the "r" is not at the word boundary because of the "b".
So if you want to match whole words that contain a substring link "reat" or "ood" then you should try:
"\b\w*?(reat|ood)\w+\b"
This reads: 1. Beginning with a word boundary begin matching any number word-characters, but don't be gready. 2. Match "reat" or "ood" enshures that only those words are matched that contain one of them. 3. Match any number of word characters following "reat" or "ood" until the next word boundary is reached.
This will match:
"goodness", "good", "ood" (if a complete word)
It can be read as: Give me all complete words that contain "ood" or "reat".
Is that what you are looking for?
Upvotes: 2
Reputation: 34367
(good)*(great)*
after your edit:
\b(g*o*o*d*)*(g*r*e*a*t*)*\b
Upvotes: 3