user990951
user990951

Reputation: 1479

C# regex pattern to getting words

I am trying to figure out the pattern that will get words from a string. Say for instance my string is:

string text = "HI/how.are.3.a.d.you.&/{}today 2z3";

I tried to eliminate anything under 1 letter or number but it doesn't work:

Regex.Split(s, @"\b\w{1,1}\b");

I also tried this:

Regex.Splits(text, @"\W+"); 

But it outputs:

"HI how are a d you today"

I just want to get all the words so that my final string is:

"HI how are you today"

Upvotes: 3

Views: 9962

Answers (1)

Ahmad Mageed
Ahmad Mageed

Reputation: 96477

To get all words that are at least 2 characters long you can use this pattern: \b[a-zA-Z]{2,}\b.

string text = "HI/how.are.3.a.d.you.&/{}today 2z3";
var matches = Regex.Matches(text, @"\b[a-zA-Z]{2,}\b");
string result = String.Join(" ", matches.Cast<Match>().Select(m => m.Value));
Console.WriteLine(result);

As others have pointed out in the comments, "A" and "I" are valid words. In case you decide to match those you can use this pattern instead:

var matches = Regex.Matches(text, @"\b(?:[a-z]{2,}|[ai])\b",
                            RegexOptions.IgnoreCase);

In both patterns I've used \b to match word-boundaries. If you have input such as "1abc2" then "abc" wouldn't be matched. If you want it to be matched then remove the \b metacharacters. Doing so from the first pattern is straightforward. The second pattern would change to [a-z]{2,}|[ai].

Upvotes: 4

Related Questions