Shazter
Shazter

Reputation: 305

Match word index

I have to know in which word the match occurred.

I thought is possible to use below code, but is give me the index and doesn't have a word counter inside.

Is it possible to get back information after which word the match occured?

const string stringToTest = "Am Rusch";
const string patternToMatch = @"\bRusch*";

Regex regex = new Regex(patternToMatch, RegexOptions.Compiled);

MatchCollection matches = regex.Matches(stringToTest);

foreach (Match match in matches)
{
    Console.WriteLine(match.Index);
}

The word count shall be 1, because match was found in second word.

Upvotes: 1

Views: 979

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

The quickest method is to split the string into words and find the index of the word that matches the pattern:

const string stringToTest = "Am Rusch";
const string patternToMatch = @"\bRusch*";
Console.WriteLine(Regex.Split(stringToTest,@"[^\w\p{M}]+")
        .Where(m => !string.IsNullOrEmpty(m))
        .ToList()
        .FindIndex(p => Regex.IsMatch(p,patternToMatch))
);
// Output: 1

See the IDEONE demo

Explanations:

  • Regex.Split(stringToTest,@"[^\w\p{M}]+") splits the string into words as [^\w\p{M}]+ matches one or more symbols other than word and diacritics symbols
  • .Where(m => !string.IsNullOrEmpty(m)) removes all empty elements
  • .FindIndex(p => Regex.IsMatch(p,patternToMatch)) fetches the index of the word you need.

A matching alternative in order not remove empty elements:

Regex.Matches(stringToTest,@"[\w\p{M}]+") // Match all words
        .Cast<Match>()                    // Cast to Matches array
        .Select(m => m.Value)             // Collect values only
        .ToList()                         // Convert to list
        .FindIndex(p => Regex.IsMatch(p, patternToMatch)) 

Upvotes: 1

ΩmegaMan
ΩmegaMan

Reputation: 31616

Regex is a pattern matching tool which is not designed to interpret the text.

If one wants a basic word count use the regex index found and pass it to a method which has the heuristics of what is a word; such as this extension:

public static int AtWord(this string strBuffer, int index)
{ 
    int foundAt = -1;

    var splits = Regex.Split(strBuffer.Substring(0, index), @"(\s+)");

    if (splits.Any())
       foundAt = splits.Count() - 2;

    return foundAt;
}

Used as

const string stringToTest = "Am Rusch Lunch";
const string patternToMatch = @"Rusch";

var match = Regex.Match(stringToTest, patternToMatch);

var wordIndex = stringToTest.AtWord(match.Index); // Returns 1, for a zero based list

Upvotes: 1

Saleem
Saleem

Reputation: 8978

You can play a little trick to get word index which is after getting index of desired string, run another regex to get its word index.

const string stringToTest = "Am Rusch, you dare";
const string patternToMatch = @"\bRusch*";

Regex regex = new Regex(patternToMatch, RegexOptions.Compiled);

MatchCollection matches = regex.Matches(stringToTest);

foreach (Match match in matches)
{
    var wordIndex = Regex.Split(stringToTest.Substring(0, match.Index), "\\W").Count()-1;
    Console.WriteLine("Word Index: " + wordIndex);
}

This will return string Word Index: 1

Upvotes: 1

RakouskyS
RakouskyS

Reputation: 54

Split the string stringToTest by spaces, then you can easily find out in which word the match occured

Upvotes: 1

Related Questions