buhtz
buhtz

Reputation: 12152

Get the regex match and the rest (none-match) from Python's re module

Does the re module of Python3 offer an in-build way to get the match and the rest (none-match) back?

Here is a simple example:

>>> import re
>>> p = r'\d'
>>> s = '1a'
>>> re.findall(p, s)
['1']

The result I want is something like ['1', 'a'] or [['1'], ['a']] or something else where I can differentiate between match and rest.

Of course can subtract the resulting (matching) string from the original one to get the rest. But is there an in build way for this?

I do not set the regex tag here because the question is less related to RegEx itself but more to a feature of a Python package.

Upvotes: 4

Views: 899

Answers (3)

Talon
Talon

Reputation: 1884

No, the match does not show the data that was cut off by itself.

The Match object that a regex gives you contains information about where data was found, you could extract it with that

import re
p = r'\d(?<=)'
s = '1a'
match = next(re.finditer(p, s))
# >>> match
# <re.Match object; span=(0, 1), match='1'>

head = match.string[:match.start()]  # ""
tail = match.string[match.end():]  # "a"

Note that re.findall doesn't give you Match-objects, you'll need another function that does that, like re.finditer. I'm using next() here because it returns an iterator instead of a list, you'd usually cast it to a list or loop over it.


Another option would be to make these groups in your pattern directly.

If you're interested in both, before and after the match:

import re
p = r'(^.*?)(\d)(.*$)'
s = '1a'
re.findall(p, s)
# [('', '1', 'a')]

But this will not give you multiple results results in the same string, as they are overlapping and you can't have variable-with lookbehinds in the builtin re library.

If you're only interested in the string after the match, then you can do that

import re
p = r'(\d)(?=(.*))'
s = '1a'
re.findall(p, s)
# [('1', 'a')]
s = '1a2b'
re.findall(p, s)
# [('1', 'a2b'), ('2', 'b')]

Upvotes: 2

Nilton Moura
Nilton Moura

Reputation: 320

You can match everything and create groups to "split" between the important part from the rest:

>>> import re
>>> p = r'(\d+)(.*)'
>>> s = '12a\n34b\ncde'
>>> re.findall(p, s)
[('12', 'a'), ('34', 'b')]

re.findall documentation

Upvotes: 4

gremur
gremur

Reputation: 1690

Possible solution is the following:

import re

string = '1a'
re_pattern = r'^(\d+)(.*)'

result = re.findall(re_pattern, string)
print(result)

Returns list of tuples

[('1', 'a')]

or if you like to return list of str items

result = [item for t in re.findall(re_pattern, string) for item in t]
print(result)

Returns

['1', 'a']

Explanations to the code:

  • re_pattern = r'(\d+)(.*)' is looking for two groups: 1st group (\d+) means digits one or more, 2nd group (.*) means the rest of the string.
  • re.findall(re_pattern, string) returns list of tuple like [('1', 'a')]
  • list comprehension converts list of tuples to list of string items

Upvotes: 3

Related Questions