D P.
D P.

Reputation: 1059

C# regular expression to get words between 4 to 10 characters

I am trying to get all the words in a string, that are at least 4 characters long and less than 10 characters. When I use the following regular expression, it just returned the whole string as one word. Can you please look at the following example and tell me how should I write this regular expression?

string result = "Overfishing, erosion and warmer waters are feeding jellyfish blooms in coastal regions worldwide. And they're causing damage"
string[] words = Regex.Split(result, @"[\W]{4,10}");

foreach (string line in words)
{
    Console.WriteLine(line);
}

Upvotes: 2

Views: 1172

Answers (2)

Soner Gönül
Soner Gönül

Reputation: 98750

Without regex, you can use String.Split method like;

string result = "Overfishing, erosion and warmer waters are feeding jellyfish blooms in coastal regions worldwide. And they're causing damage";
var array = result.Split(new string[] {",", ".", " "}, StringSplitOptions.RemoveEmptyEntries);
foreach (var item in array)
{
   if(item.Length >= 4 && item.Length < 10)
      Console.WriteLine(item);
}

Output will be;

erosion
warmer
waters
feeding
jellyfish
blooms
coastal
regions
worldwide
they're
causing
damage

Here a demonstration.

Upvotes: 2

p.s.w.g
p.s.w.g

Reputation: 149020

Your code isn't working because the pattern will only match a sequence of 4 to 10 consecutive non-word characters, which doesn't appear in the string. So Regex.Split just returns an array containing the original string.

Try using this pattern:

\b\w{4,10}\b

For example:

string[] words = Regex.Matches(result, @"\b\w{4,10}\b")
                      .Cast<Match>()
                      .Select(m => m.Value)
                      .ToArray();

This will match any sequence of 4 to 10 consecutive word characters, surrounded by word boundaries.

Upvotes: 4

Related Questions