Match first parenthesis with Python

Question

From a string such as

70849   mozilla/5.0(linux;u;android4.2.1;zh-cn)applewebkit/534.30(khtml,likegecko)version/4.0mobilesafari/534.30

I want to get the first parenthesized content linux;u;android4.2.1;zh-cn.

My code looks like this:

s=r'70849   mozilla/5.0(linux;u;android4.2.1;zh-cn)applewebkit/534.30(khtml,likegecko)version/4.0mobilesafari/534.30'
re.search("(\d+)\s.+$(\S+)$", s).group(2)

but the result is the last brackets' contents khtml,likegecko.

How to solve this?

Wiktor Stribiżew · Accepted Answer

The main issue you have is the greedy dot matching .+ pattern. It grabs the whole string you have, and then backtracks, yielding one character from the right at a time, trying to accommodate for the subsequent patterns. Thus, it matches the last parentheses.

You can use

^(\d+)\s[^(]+$([^()]+)$

See the regex demo. Here, the [^(]+ restricts the matching to the characters other than ( (so, it cannot grab the whole line up to the end) and get to the first pair of parentheses.

Pattern expalantion:

^ - string start (NOTE: If the number appears not at the start of the string, remove this ^ anchor)
(\d+) - Group 1: 1 or more digits
\s - a whitespace (if it is not a required character, it can be removed since the subsequent negated character class will match the space)
[^(]+ - 1+ characters other than (
$ - a literal (
([^()]+) - Group 2 matching 1+ characters other than ( and )
$- closing ).

Regular expression visualization

Debuggex Demo

Here is the IDEONE demo:

import re
p = re.compile(r'^(\d+)\s[^(]+$([^()]+)$')
test_str = "70849   mozilla/5.0(linux;u;android4.2.1;zh-cn)applewebkit/534.30(khtml,likegecko)version/4.0mobilesafari/534.30"
print(p.findall(test_str))
# or using re.search if the number is not at the beginning of the string
m = re.search(r'(\d+)\s[^(]+$([^()]+)$', test_str)
if m:
    print("Number: {0}\nString: {1}".format(m.group(1), m.group(2)))
# [('70849', 'linux;u;android4.2.1;zh-cn')]
# Number: 70849
# String: linux;u;android4.2.1;zh-cn

Match first parenthesis with Python

Answers (2)

Related Questions