Reputation: 6754
Why the next code does not match the word SELECT?
import re
re_q = r'(\d{4})-(\d{2})-(\d{2})\s(\d{2}):(\d{2}):(\d{2})\.*\d*\+\d{2}\s|\s(SELECT).*'
raw_q = "2014-01-23 15:28:32.993995+04 | SELECT query_start, query from pg_stat_activity WHERE state='active'"
m = re.match( re_q, raw_q )
for i in range( 1, 8 ):
print "Group <{0}>: {1}".format( i, m.group( i ) )
Output:
Group <1>: 2014
Group <2>: 01
Group <3>: 23
Group <4>: 15
Group <5>: 28
Group <6>: 32
Group <7>: None
Upvotes: 1
Views: 67
Reputation: 239463
From the docs,
'|'
A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the '|' in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the '|' operator is never greedy. To match a literal '|', use \|, or enclose it inside a character class, as in [|].
|
means OR
in Regular expression language. You have to escape that also, with \
. So, \s|\s
should have been \s\|\s
. After fixing that, I get
Group <1>: 2014
Group <2>: 01
Group <3>: 23
Group <4>: 15
Group <5>: 28
Group <6>: 32
Group <7>: SELECT
Upvotes: 3