symcbean
symcbean

Reputation: 48357

Unexpected result from PowerShell regex

I am trying to identify errors in a log file. The application uses five uppercase letters followed by three digits followed by 'E' as an error code. The error code is followed by a non-word character. I was identifying cases with:

$errors=Select-string -Path "logfile.txt" -Pattern "[A-Z]{5}[0-9]{3}E\W"

However the remainder of the content now includes

ab1bea8a-a00e-4211-b1db-2facecfd725e.

Which is being matched by the regex and flagged as an error. I changed the regex to

\p{Lu}{5}[0-9]{3}E\W

(which I expected to match five upper case characters), but why does it still match the non-error lower case pattern?

Upvotes: 2

Views: 659

Answers (2)

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200293

PowerShell regular expression matching is case-insensitive by default. There are several ways for making matches case-sensitive, though.

  • Add the -CaseSensitive switch when using the Select-String cmdlet:

    -CaseSensitive

    Makes matches case-sensitive. By default, matches are not case-sensitive.

    C:\> 'abc' | Select-String -Pattern 'A'
    
    abc
    
    C:\> 'ABC' | Select-String -Pattern 'A'
    
    ABC
    
    C:\> 'abc' | Select-String -Pattern 'A' -CaseSensitive    # ← no match here
    C:\> 'ABC' | Select-String -Pattern 'A' -CaseSensitive
    
    ABC
    
  • Use the case-sensitive version of the regular expression matching operators:

    By default, all comparison operators are case-insensitive. To make a comparison operator case-sensitive, precede the operator name with a c. For example, the case-sensitive version of -eq is -ceq. To make the case-insensitivity explicit, precede the operator with an i. For example, the explicitly case-insensitive version of -eq is -ieq.

    C:\> 'abc' -match 'A'
    True
    C:\> 'ABC' -match 'A'
    True
    C:\> 'abc' -cmatch 'A'    # ← no match here
    False
    C:\> 'ABC' -cmatch 'A'
    True
    
  • Force a case-sensitive match by adding a miscellaneous construct ((?...), not to be confused with non-capturing groups (?:...)) with the inverted "case-insensitive" regex option to the regular expression (this works with both Select-String cmdlet and -match operator):

    C:\> 'abc' | Select-String -Pattern '(?-i)A'    # ← no match here
    C:\> 'ABC' | Select-String -Pattern '(?-i)A'
    
    ABC
    
    C:\> 'abc' -match '(?-i)A'    # ← no match here
    False
    C:\> 'ABC' -match '(?-i)A'
    True
    

Upvotes: 2

Tomalak
Tomalak

Reputation: 338228

The "case-insensitive" regex flag is set by Select-String, which makes \p{Lu} case-insensitive, just as it does with [A-Z].

Try adding the -CaseSensitive parameter to the command.

You can confirm this by running some .NET code, for example in LINQPad:

(new Regex(@"\p{Lu}", RegexOptions.IgnoreCase)).IsMatch("a")

Upvotes: 4

Related Questions