Mark
Mark

Reputation: 178

How do you exclude a string from a Regular Expression so that if the string is present on a line of text, it will not return a match?

I know that a similar question to this has been asked before but I couldn't get that solution to work. It's this one

Regular expression to match a line that doesn't contain a word?

Here's the text

     ID   Type    Code    Test Name                  Dept    Date --- Time --- By
 ---- ---- ---------- ------------------------- ------ -------- --------

 01     S  10231AB=,+ Test1 With Spaces       20180913  1:08 AM ENIG01
 02     S  %SBTEX1    Test2 With Spaces       20180912 10:02 AM MYR001
 03     B  6399AB=    Test3 With Spaces       20180912 12:07 AM WDHLSY1
 04     S  4848AB=,4+ Test4 With Spaces       20180912 12:07 AM WDHLSY1
 05     S  899AB=,+   TSH+                    20180913  1:08 AM ENIG01
 06     S  899AB=,+   TSH+  

Lines 1 and 2 are not a match because the contain the text "10231" and "%SBTEX1".

Line 5 is the match.

Line 6 is not a match because it does not have a string of digits such as "20180913" followed by the date and time.

I tried but could not even come up with a regular expression that matched all of the lines of code except for line 6.

Here's the Regex that is in the post mentioned above. It excludes a line of code that contains a word.

^((?!hede).)*$

The Question:

A big shout out to Wiktor Stribiżew who solved my original question. But I had omitted some text and when I tried to implement his solution, I realized the problem was more complicated than I had initially thought.

If you would like to see his solution to the original question, please visit the link below.

Wiktor's Solution To The Original Question

Wiktor if you could. Please post your solution on RegexStorm.Net/Tester again, that was amazing!

Thank you,

Mark S.

Upvotes: 0

Views: 120

Answers (2)

Mark
Mark

Reputation: 178

The answer for this particular problem is:

(?m)(?>^[\t\p{Zs}]*\d+\s+S\s+\S+)(?<!\s\S*(?<!\d)(?:10231|%SBTEX1)(?!\d)\S*).+\d+[\p{Zs}\t]+\d+

Click the hyperlink below to be taken to this solution on RegexStorm.Net/Tester so you can mess around with the Regex yourself for learning purposes.

Interactive Solution On RegexStorm.Net/Tester

This will match lines 4 and lines 5 which is what I wanted. Originally I had

(?m)(?>^[\t\p{Zs}]*\d+\s+S\s+\S+)(?<!\s\S*(?<!\d)(?:10231|%SBTEX1)(?!\d)\S*).+\d+\s+\d+

Which was only matching line 4. I read Wiktor's comment and he said

"Remember to replace \s with [\p{Zs}\t] if you want to stay on a line while matching."

So I then replaced the \s+ at the end of this Regex with [\p{Zs}\t]+ and got the answer that will work for my particular problem. One more time, it is:

(?m)(?>^[\t\p{Zs}]*\d+\s+S\s+\S+)(?<!\s\S*(?<!\d)(?:10231|%SBTEX1)(?!\d)\S*).+\d+[\p{Zs}\t]+\d+

I would also encourage anyone who needs to exclude any string of text from being a match in a Regex to manipulate this solution to your own needs.

Thank you Wiktor. I couldn't have gotten this solution without your help!

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626932

You may use

(?m)^\d+\s+\w\s+\d+(?<!\s(?:10231|91431))\r?$

See the regex demo.

I assume the lines do not start with whitespaces, so I removed the initial \s+ from your pattern and added the ^ as a start of a line anchor (as (?m) modifies the behavior of both ^ and $, thus, making \r? necessary for $ to match at the CRLF line endings.)

Pattern details

  • (?m) - ^ now matches the start of a line and $ matches the end of a line
  • ^ - start of a line
  • \d+ - 1+ digits
  • \s+ - 1+ whitespaces (replace with [\p{Zs}\t]+ to only match horizontal whitespaces ([^\S\r\n]+ might also do))
  • \w - a word char
  • \s+ - 1+ whitespaces
  • \d+ - 1+ digits
  • (?<!\s(?:10231|91431)) - a negative lookbehind that fails the match if, immediately to the left of the current location, there is a whitespace and either of the two numeric values
  • \r?$ - an optional CR and end of a line anchor.

Upvotes: 2

Related Questions