jason
jason

Reputation: 7164

Regular expression that works on dots

I have this regular expression :

string[] values = Regex
  .Matches(mystring4, @"([\w-[\d]][\w\s-[\d]]+)|([0-9]+)")
  .OfType<Match>()
  .Select(match => match.Value.Trim())
  .ToArray(); 

This regular expression turns this string : MY LIMITED COMPANY (52100000 / 58447000)";

To these strings :

MY LIMITED COMPANY - 52100000 - 58447000

This also works on non-English characters.

But there is one problem, when I have this string : MY. LIMITED. COMPANY. , it splits that too. I don't want that. I don't want that regular expression to work on dots. How can I do that? Thanks.

Upvotes: 0

Views: 200

Answers (2)

Mario
Mario

Reputation: 36497

I'd simplify the expression. What if the names in the front include numbers? Not that my solution doesn't exactly mimic the original expression. It will allow numbers in the name part.

Let's start from the beginning:

  • To match words all you need is a sequence of word characters:

    \w+

    This will match any alphanumerical characters including underscores (_).

  • Considering you want the possibility of the word ending with a dot, you can add it and make it optional (one or zero matches):

    \w+\.?

    Note the escape to make it an actual character rather than a character class "any character".

  • To match another potential word following, we now simply duplicate this match, add a white space before, and once again make it optional using the * quantifier:

    \w+\.?(?:\w+\.?)*

    In case you haven't seen a group starting with ?: is a non-matching group. In essence this works like a usual group, but won't save a matching group in your results.

  • And that's it already. This pattern will split your demo string as expected. Of course there could be other possible characters not being covered by this.

You can see the results of this matching online here and also play around with it.

To test your regular expressions (and to learn them), I'd really recommend you using a tool such as http://regex101.com

It has an input mask allowing you to provide your pattern and your target string. On the right hand side it will first explain the pattern to you (to see if it's indeed what you had in mind) and below it will show all the groups matched. Just keep in mind it actually uses slightly different flavors of regular expressions, but this shouldn't matter for such simple patterns. (I'm not affiliated with that site, just consider it really useful.)

As an alternative, to directly use C#'s regex parser, you can also try this Regex Tester. This works in a similar way, although doesn't include any explanations, which might be not as ideal for someone just getting started.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626927

You may add the dot after each \w in your pattern, and I also suggest removing unnecessary ( and ):

string[] values = Regex
      .Matches("MY. LIMITED. COMPANY. (52100000 / 58447000)", @"[\w.-[\d]][\w.\s-[\d]]+|[0-9]+")
      .OfType<Match>()
      .Select(match => match.Value.Trim())
      .ToArray(); 
foreach (var s in values)
    Console.WriteLine(s);

See the C# demo

Pattern:

  • [\w.-[\d]] - one Unicode letter or underscore ([\w-[\d]]) or a dot (.)
  • [\w.\s-[\d]]+ - 1 or more (due to + quantifier at the end) characters that are either Unicode letters or underscore, ., or whitespace (\s)
  • | - or
  • [0-9]+ - one or more ASCII-only digits

Upvotes: 2

Related Questions