Reputation: 12152
Does the re
module of Python3 offer an in-build way to get the match and the rest (none-match) back?
Here is a simple example:
>>> import re
>>> p = r'\d'
>>> s = '1a'
>>> re.findall(p, s)
['1']
The result I want is something like ['1', 'a']
or [['1'], ['a']]
or something else where I can differentiate between match and rest.
Of course can subtract the resulting (matching) string from the original one to get the rest. But is there an in build way for this?
I do not set the regex
tag here because the question is less related to RegEx itself but more to a feature of a Python package.
Upvotes: 4
Views: 899
Reputation: 1884
No, the match does not show the data that was cut off by itself.
The Match object that a regex gives you contains information about where data was found, you could extract it with that
import re
p = r'\d(?<=)'
s = '1a'
match = next(re.finditer(p, s))
# >>> match
# <re.Match object; span=(0, 1), match='1'>
head = match.string[:match.start()] # ""
tail = match.string[match.end():] # "a"
Note that re.findall
doesn't give you Match
-objects, you'll need another function that does that, like re.finditer
. I'm using next()
here because it returns an iterator instead of a list, you'd usually cast it to a list or loop over it.
Another option would be to make these groups in your pattern directly.
If you're interested in both, before and after the match:
import re
p = r'(^.*?)(\d)(.*$)'
s = '1a'
re.findall(p, s)
# [('', '1', 'a')]
But this will not give you multiple results results in the same string, as they are overlapping and you can't have variable-with lookbehinds in the builtin re
library.
If you're only interested in the string after the match, then you can do that
import re
p = r'(\d)(?=(.*))'
s = '1a'
re.findall(p, s)
# [('1', 'a')]
s = '1a2b'
re.findall(p, s)
# [('1', 'a2b'), ('2', 'b')]
Upvotes: 2
Reputation: 320
You can match everything and create groups to "split" between the important part from the rest:
>>> import re
>>> p = r'(\d+)(.*)'
>>> s = '12a\n34b\ncde'
>>> re.findall(p, s)
[('12', 'a'), ('34', 'b')]
Upvotes: 4
Reputation: 1690
Possible solution is the following:
import re
string = '1a'
re_pattern = r'^(\d+)(.*)'
result = re.findall(re_pattern, string)
print(result)
Returns list of tuples
[('1', 'a')]
or if you like to return list of str items
result = [item for t in re.findall(re_pattern, string) for item in t]
print(result)
Returns
['1', 'a']
Explanations to the code:
re_pattern = r'(\d+)(.*)'
is looking for two groups: 1st group (\d+)
means digits one or more, 2nd group (.*)
means the rest of the string.re.findall(re_pattern, string)
returns list of tuple like [('1', 'a')]
Upvotes: 3