Reputation: 33
I have a list of project names that I need some matching on.The list of projects could look something like this:
suzu
suzu-domestic
suzu-international
suzuran
suzuran-international
scorpion
scorpion-default
yada
yada-yada
etc
If the searched for project is suzu
, I'd like to have the following result from the list:
suzu
suzu-domestic
suzu-international
but not anything containing suzuran
. I also like to have the following match if the search for project is suzuran
suzuran
suzuran-international
but not anything containing suzu
.
In C# code I have something that looks like similar to this:
String searchForProject = "suzu";
String regStr = @"THE_REGEX_GOES_HERE"; // The regStr will be in a config file
List<Project> projects = DataWrapper.GetAllProjects();
Regex regEx = new Regex(String.Format(regStr, searchForProject));
result = new List<Project>();
foreach (Project proj in projects)
{
if (regEx.IsMatch(proj.ProjectName))
{
result.Add(proj);
}
}
The question is, can I have a regexp that will enable me to get match on all exact project names, but not the ones that would get returned by a startWith equivalent?
(Today I have a regStr = @"^({0})#"
, but this does not satisfy the above scenario since it gives more hits than it should)
I'd appreciate if someone can give me a hint in the right direction. Thanks ! Magnus
Upvotes: 3
Views: 92
Reputation: 8793
With negative lookahead:
suzu(?!.*ran).*\b
This also uses \b for a word break
Upvotes: 0
Reputation: 23541
If you want an elegant solution in one line with Linq and without regex, you can check this working solution (Demo on .NETFiddle) :
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public void Main()
{
string input = "suzu";
string s = @"suzu
suzu-domestic
suzu-international
suzuran
suzuran-international
scorpion
scorpion-default
yada
yada-yada";
foreach (var line in ExtractLines(s, input))
Console.WriteLine(line);
}
// works if "-" is your delimiter.
IEnumerable<string> ExtractLines(string lines, string input)
{
return from line in lines.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries) // use to split your string by line
let cleanLine = line.Contains("-") ? line.Split('-')[0] : line // use only the needed part
where cleanLine.Equals(input) // check if the output match with the input
select line; // return the valid line
}
}
Upvotes: 0
Reputation: 46361
You can use "\b{0}\b.*"
if you want the match anywhere in the string (but not in the middle of a word), or "^{0}\b.*"
if you only want it at the start.
See a regexstorm sample.
Upvotes: 0
Reputation: 31035
You can use a regex like this:
^suzu\b.*
If you want suzuran
just use:
^suzuran\b.*
Upvotes: 0
Reputation: 627488
All you need is actually
var regStr = @"^{0}\b";
The ^
anchor asserts the position at the beginning of string.
The \b
pattern matches a location between a word and a non-word character, the start or end of string. You do not need to match the rest of string with .*
since you are using Regex.IsMatch
, it is a redundant overhead.
C# test code:
var projects = new List<string>() { "suzu", "suzu-domestic", "suzu-international", "suzuran", "suzuran-international", "scorpion", "scorpion-default", "yada", "yada-yada" };
var searchForProject = "suzu";
var regStr = @"^{0}\b"; // The regStr will be in a config file
var regEx = new Regex(String.Format(regStr, searchForProject));
var result = new List<string>();
foreach (var proj in projects)
{
if (regEx.IsMatch(proj))
{
result.Add(proj);
}
}
The foreach
may be replaced with a shorter LINQ:
var result = projects.Where(s => regEx.IsMatch(s)).ToList();
Upvotes: 2