Reputation: 12520
Given the following string as input:
[2015/06/09 14:21:59] mod=syn|cli=192.168.1.99/49244|srv=192.168.1.100/80|subj=cli|os=Windows 7 or 8|dist=0|params=none|raw_sig=4:128+0:0:1460:8192,8:mss,nop,ws,nop,nop,sok:df,id+:0
I'm trying to match the value of subj
, ie: in the above case the expected output would be cli
I don't understand why my regex is not working:
subj = re.match(r"(.*)subj=(.*?)|(.*)", line).group(2)
From what I can tell, the second group in here should be cli
but I'm getting an empty result.
Upvotes: 1
Views: 111
Reputation: 26667
The |
has special meaning in regex (Which creates alternations ) , hence escape it as
>> re.match(r"(.*)subj=(.*?)\|", line).group(2)
'cli'
Another Solution
You can use re.search()
so that you can get rid of the groups at the start of subj
and that after the |
Example
>>> re.search(r"subj=(.*?)\|", line).group(1)
'cli'
Here we use group(1)
since there is only one group that is being captured instead of three as in previous version.
Complex version
You can even get rid of all the capturing if you are using look arounds
>>> re.search(r"(?<=subj=).*?(?=\|)", line).group(0)
'cli'
(?<=subj=)
Checks if the string matched by .*?
is preceded by subj
.
.*?
Matches anything, non greedy matching.
(?=\|)
Check if this anything is followed by a |
.
Upvotes: 3
Reputation: 626903
I would use a negated class [^|]*
with re.search
for better performance:
import re
p = re.compile(r'^(.*)subj=([^|]*)\|(.*)$')
test_str = "[2015/06/09 14:21:59] mod=syn|cli=192.168.1.99/49244|srv=192.168.1.100/80|subj=cli|os=Windows 7 or 8|dist=0|params=none|raw_sig=4:128+0:0:1460:8192,8:mss,nop,ws,nop,nop,sok:df,id+:0"
print re.search(p, test_str).group(2)
See IDEONE demo
Note I am not using both lazy and greedy quantifiers in the regex (it is not advisable usually).
The pipe symbol must be escaped to be treated as a literal |
symbol.
REGEX EXPLANATION:
^
- Start of string(.*)
- The first capturing group that matches characters from the beginning up tosubj=
- A literal string subj=
([^|]*)
- The second capturing group matching any characters other than a literal pipe (inside a character class, it does not need escaping)\|
- A literal pipe (must be escaped)(.*)
- The third capturing group (if you need to get the string after up to the end.$
- End of stringUpvotes: 0
Reputation: 18783
I'd recommend using the following regex, because it will provide better performance with two additions/substitutions:
^
[^\|]*
is faster than (.*)?
Code
subj = re.match(r"^.*\|subj=([^\|]*)", line).group(1)
regex:
^.*\|subj=([^\|]*)
Upvotes: 2
Reputation: 3099
The pipe sign |
needs to be escaped, like so:
subj = re.match(r"(.*)subj=(.*?)\|(.*)", s).group(2)
Upvotes: 1
Reputation: 13640
You need to escape |
.. Use the following:
subj = re.match(r"(.*)subj=(.*?)\|(.*)", line).group(2)
^
Upvotes: 2