ilansch
ilansch

Reputation: 4878

C# regex does not function as expected

I am new to Regex. My input is:

2233    0 0     20180405    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

This line is allowed to be constructed only with: tab, number, float, endofline/newline.

I read line content in C#:

using (var sourceStream = new StreamReader(sourceFilePath))
{
    string iteratedLine;
    while ((iteratedLine = sourceStream.ReadLine()) != null)
    //iteratedLine = 2233\t0 0\t\t20180405\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0

Then i send iteratedLine to validate function.
I only allow the following expressions to be in the string:
1. tab
2. new line/end of line
3. number
4. float (0.123)

The following validation function does not work, What am I missing ?

bool isValid = Regex.IsMatch(inputLine, @"(\d+\.{1}\d*)|(\d)|(\\t)|(\\n)|(\\r)");

If i take the regex (\d+.{1}\d*)|(\d)|(\t)|(\n)|(\r) and use in regex101.com its suppose to fail line that has other character then these 4 restrictions.

Thanks 1

Upvotes: 0

Views: 94

Answers (2)

Richardissimo
Richardissimo

Reputation: 5765

You are missing a few points:

  1. You need to anchor both ends of the Regex to both ends of the string, so it needs ^ at the start and $ at the end. Without that, it can return true if any part of the line matches; but we only want it to return true if the whole line matches the pattern.
  2. StreamReader's ReadLine strips off the end of line, so you don't need to worry about that.
  3. You need to enforce that between the tabs are values, otherwise a line of just tabs would pass.

This should do the trick...

^\d+(?:\.\d+)?(?:\t\d+(?:\.\d+)?)+$

If you are expecting a particular number of values on each row, you could replace the final + with {x,x} where x is the number of items minus one.

An alternative approach would be to use string.Split and use Linq to check that all the items return true from double.TryParse.

Upvotes: 2

The fourth bird
The fourth bird

Reputation: 163237

You use isMatch which finds a match in a specified input string. In your regex you use alternations which will find a match for for example one or more digits. If your string also contains unwanted characters, the alternation would still match one or more digits and not the unwanted characters resulting and isMatch will still return true

Test

You could use an anchor ^ assert the start of the line, match one or more digits \d+ followed by an optional part (?:\.\d+)? that matches a dot and one or more digits.

Then match a tab \t followed by more digits followed by an optional part that matches a dot and one or more digits and assert the end of the line $

Repeat the second part one or more times so that there are at least 2 values separated by a tab.

^\d+(?:\.\d+)?(?:\t\d+(?:\.\d+)?)+$

Demo

Upvotes: 2

Related Questions