sergzach
sergzach

Reputation: 6754

Why does matching group exist but does not really match?

Why the next code does not match the word SELECT?

import re

re_q = r'(\d{4})-(\d{2})-(\d{2})\s(\d{2}):(\d{2}):(\d{2})\.*\d*\+\d{2}\s|\s(SELECT).*'

raw_q = "2014-01-23 15:28:32.993995+04 | SELECT query_start, query from pg_stat_activity WHERE state='active'"

m = re.match( re_q, raw_q )

for i in range( 1, 8 ):
    print "Group <{0}>: {1}".format( i, m.group( i ) )

Output:

Group <1>: 2014
Group <2>: 01
Group <3>: 23
Group <4>: 15
Group <5>: 28
Group <6>: 32
Group <7>: None

Upvotes: 1

Views: 67

Answers (1)

thefourtheye
thefourtheye

Reputation: 239463

From the docs,

'|'

A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the '|' in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the '|' operator is never greedy. To match a literal '|', use \|, or enclose it inside a character class, as in [|].

| means OR in Regular expression language. You have to escape that also, with \. So, \s|\s should have been \s\|\s. After fixing that, I get

Group <1>: 2014
Group <2>: 01
Group <3>: 23
Group <4>: 15
Group <5>: 28
Group <6>: 32
Group <7>: SELECT

Upvotes: 3

Related Questions