Lembasts
Lembasts

Reputation: 113

Regular expression appears to ignore tab character

I have a regular expression that parses lines in a driver inf file to extract just the variable names and values ignoring whitespace and end of line comments that begin with a semicolon.

It looks like this:

"^([^=\s]+)[ ]*=[ ]*([^;\r\n]+)(?<! )"

Most of the time it works just fine as per the example here: regex example 1

However, when it encounters a line that has a tab character anywhere between the variable name and the equals sign, the expression fails as per the example here: regex example 2

I have tried replacing "\s" with "\t" and "\x09" and it still doesnt work. I have edited the text file that contains the tab character with a hex editor and confirmed that it is indeed ASCII "09". I don't want to use a positive character match as the variable could actually contain quite a large number of special characters.

The appearance of the literal "=" seems to cause the problem but I cannot understand why. For example, if I strip back the expression to this: regex example 3

and use the line with the tab character in it, it works fine. But as soon as I add the literal "=" as per the example here: regex example 4, it no longer matches, appearing to ignore the tab character.

Upvotes: 1

Views: 1047

Answers (3)

vks
vks

Reputation: 67988

^([^=\s]+)\s*=\s*([^;\r\n]+)(?<!\s)

Try this.see demo.

http://regex101.com/r/tV8oH3/2

Upvotes: 0

Simon Whitehead
Simon Whitehead

Reputation: 65077

You've just added the \t tab character in the wrong part I think.

This was your example 2 (not working):

^([^=\s]+)[ ]*=[ ]*([^;\r\n]+)(?<! )

This is your example 2 ... working (with a tab):

^([^=\s]+)[ \t]*=[ ]*([^;\r\n]+)(?<! )
            ^^ tab here

Seems to do the trick and match your first example: http://regex101.com/r/kQ1zH4/1

Upvotes: 0

Peter O.
Peter O.

Reputation: 32898

The two [ ]* match only space characters (U+0020 SPACE) and not other whitespace characters. Change both to [ \t]* to match tabs as well. The result would now look like:

"^([^=\s]+)[ \t]*=[ \t]*([^;\r\n]+)(?<! )"

Upvotes: 2

Related Questions