user3431399
user3431399

Reputation: 499

Python - Extract pattern from string using RegEx

I have a string in variable a as below:

a = 'foo(123456) together with foo(2468)'

I would like to use "re" to extract both foo(123456) and foo(2468) from the string.

I have two questions:

  1. What is the correct regex to be used? foo(.\*) doesn't seem to work, as it treats 123456) together with foo(2468 as .*
  2. How to extract both foo?

Upvotes: 6

Views: 13595

Answers (5)

Shashank
Shashank

Reputation: 13869

import re
pattern = re.compile(r'foo\(.*?\)')
test_str = 'foo(123456) together with foo(2468)'

for match in re.findall(pattern, test_str):
    print(match)

Two things:

  1. .*? is the lazy quantifier. It behaves the same as the greedy quantifier (.*), except it tries to match the least amount of characters possible going from left-to-right across the string. Note that if you want to match at least one character between the parentheses, you'll want to use .+?.

  2. Use \( and \) instead of ( and ) because parentheses are normally used inside regular expressions to indicate capture groups, so if you want to match parentheses literally, you have to use the escape character before them, which is backslash.

Upvotes: 9

pzp
pzp

Reputation: 6597

Use re.findall(r'foo\(.*?\)'). The backslashes escape the parentheses (which have a special meaning of denoting a group in regex), and the question mark makes the match be performed in a non-greedy manner.

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174696

You could use a negated character class.

>>> a = 'foo(123456) together with foo(2468) foo(abcdef) together with foo(jqk)'
>>> re.findall(r'\bfoo\([^()]*\)', a)
['foo(123456)', 'foo(2468)', 'foo(abcdef)', 'foo(jqk)']

[^()]* negated character class which matches any character but not of ( or ), zero or more times.

Upvotes: 4

irrelephant
irrelephant

Reputation: 4111

Simply use the non-greedy wildcard expression .*?

import re
a = 'foo(123456) together with foo(2468)'
for v in re.findall(r'foo\(.*?\)', a):
  print(v)

Upvotes: 2

Marcin
Marcin

Reputation: 238061

You can use findall with the following expression: r'(foo\(\d+\))':

import re

a = 'foo(123456) together with foo(2468)'

for v in re.findall(r'(foo\(\d+\))', a):
    print(v)

Result is:

foo(123456)
foo(2468)

Your expressoin foo(.*) does not work due to (). You need to escape them, as I did above.

Upvotes: 5

Related Questions