Alex Yin
Alex Yin

Reputation: 133

python regex search findall capturing groups

I just want to get "66664324", the content between ")" and "-". Why did the search method get the ")" and "-" themselves.

a="(021)66664324-01"
b1=re.findall('\)(.*)-',a)
>['66664324']

b2=re.search('\)(.*)-',a).group()
>')66664324-'

What are differences between the two Code snippets.

Upvotes: 11

Views: 16450

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174706

Try printing the group(1) in re.search instead of group(). Where group() prints the whole match but group(1) prints only the captured group 1(printig chars which was present inside the group index 1).

>>> a="(021)66664324-01"
>>> import re
>>> b2=re.search('\)(.*)-',a).group(1)
>>> b2
'66664324'
>>> b2=re.search('\)(.*)-',a).group()
>>> b2
')66664324-'

But re.findall gives the first preference to groups rather than the match and also it returns the results in lists but search didn't. So that this b1=re.findall('\)(.*)-',a) gives you the desired output. If a group is present then re.findall method would print only the groups not the match. If no groups are present, then only it prints the match.

>>> b1=re.findall('\)(.*)-',a)
>>> b1
['66664324']
>>> b1=re.findall('\).*-',a)
>>> b1
[')66664324-']

Upvotes: 12

adesst
adesst

Reputation: 307

The difference is in b2.group(), which equals to b2.group(0). And based on the python regex manual

the search() method of patterns scans through the string, so the match may not start at zero in that case

So in your case the result starts at index of 1. I had have tried your code with a little modification of the search rule and the expected result is at index 1.

>>> a="(021)66664324-01"

>>> re.search('\)([0-9]*)',a).group(1)

'66664324'

Upvotes: 0

Related Questions