Metscore
Metscore

Reputation: 33

Regular Expression - Get partial string

I have a list of project names that I need some matching on.The list of projects could look something like this:

suzu
suzu-domestic
suzu-international
suzuran
suzuran-international
scorpion
scorpion-default
yada
yada-yada
etc

If the searched for project is suzu, I'd like to have the following result from the list:

suzu
suzu-domestic
suzu-international

but not anything containing suzuran. I also like to have the following match if the search for project is suzuran

suzuran
suzuran-international

but not anything containing suzu.

In C# code I have something that looks like similar to this:

String searchForProject = "suzu";
String regStr = @"THE_REGEX_GOES_HERE"; // The regStr will be in a config file
List<Project> projects = DataWrapper.GetAllProjects();
Regex regEx = new Regex(String.Format(regStr, searchForProject));
result = new List<Project>();
foreach (Project proj in projects)
{
  if (regEx.IsMatch(proj.ProjectName))
  {
    result.Add(proj);
  }
}

The question is, can I have a regexp that will enable me to get match on all exact project names, but not the ones that would get returned by a startWith equivalent? (Today I have a regStr = @"^({0})#", but this does not satisfy the above scenario since it gives more hits than it should)

I'd appreciate if someone can give me a hint in the right direction. Thanks ! Magnus

Upvotes: 3

Views: 92

Answers (5)

Derek
Derek

Reputation: 8793

With negative lookahead:

suzu(?!.*ran).*\b

This also uses \b for a word break

Upvotes: 0

aloisdg
aloisdg

Reputation: 23541

If you want an elegant solution in one line with Linq and without regex, you can check this working solution (Demo on .NETFiddle) :

using System;
using System.Linq;
using System.Collections.Generic;

public class Program
{
    public void Main()
    {
        string input = "suzu";
        string s = @"suzu
suzu-domestic
suzu-international
suzuran
suzuran-international
scorpion
scorpion-default
yada
yada-yada";

        foreach (var line in ExtractLines(s, input))
            Console.WriteLine(line);    
    }

    // works if "-" is your delimiter.
    IEnumerable<string> ExtractLines(string lines, string input)
    {
        return from line in lines.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries) // use to split your string by line
            let cleanLine = line.Contains("-") ? line.Split('-')[0] : line // use only the needed part
            where cleanLine.Equals(input) // check if the output match with the input
            select line; // return the valid line
    }
}

Upvotes: 0

Amit
Amit

Reputation: 46361

You can use "\b{0}\b.*" if you want the match anywhere in the string (but not in the middle of a word), or "^{0}\b.*" if you only want it at the start.

See a regexstorm sample.

Upvotes: 0

Federico Piazza
Federico Piazza

Reputation: 31035

You can use a regex like this:

^suzu\b.*

Working demo

If you want suzuran just use:

^suzuran\b.*

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627488

All you need is actually

var regStr = @"^{0}\b";

The ^ anchor asserts the position at the beginning of string. The \b pattern matches a location between a word and a non-word character, the start or end of string. You do not need to match the rest of string with .* since you are using Regex.IsMatch, it is a redundant overhead.

C# test code:

var projects = new List<string>() { "suzu", "suzu-domestic", "suzu-international", "suzuran", "suzuran-international", "scorpion", "scorpion-default", "yada", "yada-yada" };
var searchForProject = "suzu";
var regStr = @"^{0}\b"; // The regStr will be in a config file

var regEx = new Regex(String.Format(regStr, searchForProject));
var result = new List<string>();
foreach (var proj in projects)
{
    if (regEx.IsMatch(proj))
    {
        result.Add(proj);
    }
}

enter image description here

The foreach may be replaced with a shorter LINQ:

var result = projects.Where(s => regEx.IsMatch(s)).ToList();

Upvotes: 2

Related Questions