Gaara
Gaara

Reputation: 697

Need help in pattern matching

Problem: A sentence starts with hi (case insensitive) and is not immediately followed by a space and letter d.

My regex: [hi|HI|hI|Hi][^ d|D][a-zA-Z ]*

However, I don't understand why this string hI dave how are you doing is getting accepted by regex.

I am using python re library for this.
Try: I have tried different versions [^ ][^d|D], but none of these seem to work.

Upvotes: 1

Views: 62

Answers (2)

hwnd
hwnd

Reputation: 70732

You can't use alternation inside of a character class. A character class defines a set of characters. Saying — "match one character specified by the class". The easiest way would be to implement Negative Lookahead while utilizing the inline (?i) case-insensitive modifier and anchoring.

(?i)^hi(?! d).*

Explanation:

(?i)     # set flags for this block (case-insensitive) 
^        # the beginning of the string
 hi      #   'hi'
 (?!     #   look ahead to see if there is not:
   d     #     ' d'
 )       #   end of look-ahead
.*       # any character except \n (0 or more times)

Upvotes: 4

1010
1010

Reputation: 1848

character sets are just characters between squared brackets. you don't have to separate them with |, so this

 [hi|HI|hI|Hi]

will represent only one character, either h, i, |, H or I.

that's why your regex matches "hI dave how are you doing" because first char is h, next char isn't a space, d, |, or d, and the last part is matched zero times.

note that if you want to match the entire input, you have to use anchors to express the beginning or end of the string.

so, you should match the beginning of the string with ^, then any of h or H, followed by i or I, and finally anything but space and D, that would be

^[hH][iI]( [^dD]|[^ ])

note that you should allow a space followed by anything but d or D, and if following hi there is no space, any character can follow.

Upvotes: 1

Related Questions