Reputation: 43
I am trying the following pattern :
[,;\" ](.+?\/.+?)[\",; ]
in the following string:
['"text/html,application/xhtml+xml,application/xml;q=0.9;q =0.8"']
It matches the bold text but not the italic one. Why?
I want to extract text/html
, application/xhtml+xml
and application/xml
. It is extracting 1st and 3rd but not the middle one
Upvotes: 1
Views: 134
Reputation: 627100
Your last [,"; ]
consumes the ,
after text/html
and thus, at the next iteration, when the regex engine searches for a match, the first [,;" ]
cannot match that comma. Hence, you lose one match.
You may turn the trailing [,"; ]
into a non-consuming pattern, a positive lookahead, or better, since the matches cannot contain the delimiters, use a negated character class approach:
[,;" ]([^/,;" ]+/[^/,;" ]+)
See the regex demo. If there can be more than 1 /
inside the expected matches, remove /
char from the second character class.
Details
[,;" ]
- a comma, ;
, "
, or space([^/,;" ]+/[^/,;" ]+)
- Group 1: any one or more chars that is not /
, ,
. ;
, "
and space, /
and then again any one or more chars that is not /
, ,
. ;
, "
and space as many as possibleimport re
rx = r'[,;" ]([^/,;" ]+/[^/,;" ]+)'
s = """['"text/html,application/xhtml+xml,application/xml;q=0.9;q =0.8"']"""
res = re.findall(rx, s)
print(res) # => ['text/html', 'application/xhtml+xml', 'application/xml']
Upvotes: 1