Reputation: 113
I have a regular expression that parses lines in a driver inf file to extract just the variable names and values ignoring whitespace and end of line comments that begin with a semicolon.
It looks like this:
"^([^=\s]+)[ ]*=[ ]*([^;\r\n]+)(?<! )"
Most of the time it works just fine as per the example here: regex example 1
However, when it encounters a line that has a tab character anywhere between the variable name and the equals sign, the expression fails as per the example here: regex example 2
I have tried replacing "\s" with "\t" and "\x09" and it still doesnt work. I have edited the text file that contains the tab character with a hex editor and confirmed that it is indeed ASCII "09". I don't want to use a positive character match as the variable could actually contain quite a large number of special characters.
The appearance of the literal "=" seems to cause the problem but I cannot understand why. For example, if I strip back the expression to this: regex example 3
and use the line with the tab character in it, it works fine. But as soon as I add the literal "=" as per the example here: regex example 4, it no longer matches, appearing to ignore the tab character.
Upvotes: 1
Views: 1047
Reputation: 67988
^([^=\s]+)\s*=\s*([^;\r\n]+)(?<!\s)
Try this.see demo.
http://regex101.com/r/tV8oH3/2
Upvotes: 0
Reputation: 65077
You've just added the \t
tab character in the wrong part I think.
This was your example 2 (not working):
^([^=\s]+)[ ]*=[ ]*([^;\r\n]+)(?<! )
This is your example 2 ... working (with a tab):
^([^=\s]+)[ \t]*=[ ]*([^;\r\n]+)(?<! )
^^ tab here
Seems to do the trick and match your first example: http://regex101.com/r/kQ1zH4/1
Upvotes: 0
Reputation: 32898
The two [ ]*
match only space characters (U+0020
SPACE) and not other whitespace characters.
Change both to [ \t]*
to match tabs as well. The result would now look like:
"^([^=\s]+)[ \t]*=[ \t]*([^;\r\n]+)(?<! )"
Upvotes: 2