Deep Gajjar
Deep Gajjar

Reputation: 11

re.findall() not returning all matches?

I have the following string:

This$#is% Matrix#  %!

I am trying to catch on substrings where special symbols/spaces occur between alphanumeric characters. For eg, my goal is to find these 2 set of substrings: This$#is (special symbols #, $ between 'This' and 'is') and is% Matrix (special symbol % and whitespace between 'is' and 'Matrix').

My regex findall is as follows:

match = re.findall(r'([\w]{1,})([\s\W]{1,})([\w]{1,})', temp)

It is returning me: [('This', '$#', 'is')] but not the second part ('is% Matrix'). Is there anything I am doing wrong?

If I change my string to 'is% Matrix' and apply the same regex pattern, I get this: [('is', '% ', 'Matrix')].

Upvotes: 0

Views: 953

Answers (1)

blhsing
blhsing

Reputation: 106455

You can use positive lookahead on the part you would like to have overlapping matches:

match = re.findall(r'([\w]{1,})([\s\W]{1,})(?=([\w]{1,}))', temp)

match becomes:

[('This', '$#', 'is'), ('is', '% ', 'Matrix')]

Demo: https://regex101.com/r/2PJmlX/1

Upvotes: 1

Related Questions