jonyroy
jonyroy

Reputation: 187

How to match all whitespaces(if any) before words in group using C#?

I want to match all whitespaces (if any) before words.

Regex re = new Regex(@"(\d+);([\d\.]+);([\d\.]+);([\w-\(\)\.,\/]+);(\d+);(\d+);([\d,]+);(\d+)", RegexOptions.Compiled);

The above regex is working for Example-1 but not for Example-2. Where do I need to change the regex for Example-2?

Example-1:
44;52.93; 8.24;GROSSENKNETEN;201902;28;408.7;28;509.86
71;48.22; 8.98;ALBSTADT-BADKAP;201902;28;475.3;28;-999.9
73;48.62;13.05;ALDERSBACH-KRIESTORF;201902;28;519.8;28;561.76
Example-2:
00044;52.93; 8.24;            GROSSENKNETEN;201907;31; 53.4;9; 28.6
00071;48.22; 8.98;          ALBSTADT-BADKAP;201907;31; 49.0;8;-999.9
00073;48.62;13.05;     ALDERSBACH-KRIESTORF;201907;31;  0.0;0; 15.7

Upvotes: 1

Views: 69

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626929

If you have a normal access to full C# functionality, just read a file line by line and split with ; to get all the fields.

If you are using a .NET regex based tool and need to extract specific data from lines of text, you may use

(?m)^(\d+);\s*([\d.]+);\s*([\d.]+);\s*([\w-().,\/]+);\s*(\d+);\s*(\d+);\s*([\d.]+);\s*(\d+);\s*([-+]?\d*\.?\d+)\r?$

See the regex demo

In a multiline mode, the $ in .NET regex does not match before a CR, that is why there is a \r?.

Pattern details

  • (?m) - multiline mode on
  • ^ - start of a line
  • (\d+) - Group 1: one or more digits
  • ; - a semi-colon
  • \s* - 0+ whitespaces
  • ([\d.]+) - Group 2: 1+ digits or dots
  • ;\s*([\d.]+);\s* - ;, 0+ whitespaces, Group 3: 1+ digits/dots, ;, 0+ whitespaces
  • ([\w-().,/]+) - Group 4: 1+ word, -, (, ), ., ,, / chars
  • ;\s*(\d+);\s*(\d+);\s* - ;, 0+ whitespaces, Group 5: 1+ digits, ;, 0+ whitespaces, Group 6: 1+ digits, ;, 0+ whitespaces
  • ([\d.]+) - Group 7: 1+ digits/dots
  • ;\s*(\d+) - ;, 0+ whitespaces, Group 8: 1+ digits
  • ;\s* - ; and 0+ whitespaces
  • ([-+]?\d*\.?\d+) - Group 9: - or + optionally, then 0+ digits, an optional ., 1+ digits
  • \r?$ - an optional CR char and the end of the line.

Upvotes: 1

Related Questions