zhengchl
zhengchl

Reputation: 341

Why regex "[ \A]abc" doesn't match "abc" in python

I want to match a space or the start of a string, using string "abc" for a demo:

"abc_some_words" match for "abc" at the start of the string
"some_words abc_some_words" match for there is a space before "abc"
"Aabc" don't match for there is a "A" before "abc"

so I write regex as "[ \A]abc" for "\A Matches only at the start of the string". As shown below, regex "[ \A]abc" matches " abc", but doesn't match "abc" in python.

>>> re.search(r"[ \A]abc", "babc")
>>> re.search(r"[ \A]abc", "abc")
>>> re.search(r"[ \A]abc", " abc")
<_sre.SRE_Match object at 0xb6fccdb0>

Upvotes: 1

Views: 679

Answers (3)

dawg
dawg

Reputation: 103874

\A -- start of string is the mirror image of \Z -- end of string.

The meaning of ^ and $ can be modified by the re.M flag. They can either mean the start of the string for ^ or the start of each line; $ can be either the end of string or the end of each line -- depending on the re.M flag.

However, \A is unambiguously the start of the string and \Z is unambiguously the the end of the string.

Suppose you have the string:

txt='''\
1 ABC
2 ABC
3 ABC
4 ABC'''

To match the ABC at the start of each line you may do:

>>> re.findall(r'^\d\sABC', txt, re.M)
['1 ABC', '2 ABC', '3 ABC', '4 ABC']

But if you only want the first and the last line, you may do:

>>> re.findall(r'\A\d\sABC|\d\sABC\Z', txt, re.M)
['1 ABC', '4 ABC']

Upvotes: 0

nneonneo
nneonneo

Reputation: 179452

Unfortunately, \A does not represent a character or set of characters. Therefore, it (and the similar \Z) cannot be used within a character class ([]). If you put it in a character class then it will silently be treated as a capital A.

To match either a space or the start of the string, you may use an alternation instead: (?:\A| )abc (where I used a non-capturing group (?:)).

Upvotes: 3

Federico Piazza
Federico Piazza

Reputation: 31005

If you want to match the beginning of the string you can use anchor ^. So, if you want to have a space at the beginning or abc you can use this regex:

^\s?abc

Working demo

Upvotes: 0

Related Questions