Reputation: 684
I have this string :
"Za @Foo_Bar: @BAR_foo @FooBAR @BArfoo"
And a regex pattern like this:
((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)
or
(Za\s)?@[A-Za-z0-9_]*
I want it to return this list:
['Za @Foo_Bar','BAR_foo','FooBAR','BArfoo']
But I'm getting unexpected results:
>>> import re
>>> import regex
>>> a = "Za @Foo_Bar: @BAR_foo @FooBAR @BArfoo"
>>> regex.fullmatch(u'((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)',a) is None
True
>>> re.findall(u'((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)',a)
[('Za @Foo_Bar', 'Za ', ''), ('@BAR_foo', '', ''), ('@FooBAR', '', ''), ('@BArfoo', '', '')]
The second result is more convincing but it contains a lot of junk values:
>>> regex.findall(u'((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)',a)
[('Za @Foo_Bar', 'Za ', ''), ('@BAR_foo', '', ''), ('@FooBAR', '', ''), ('@BArfoo', '', '')]
>>> match = re.search(u'((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)',a)
>>> match.groups()
('Za @Foo_Bar', 'Za ', None)
Why does fullmatch
return None
? How can I get a clean list?
Upvotes: 0
Views: 87
Reputation: 163517
As an alternative you might use (?<!\AZa):? @
and split on an optional colon followed by a space and an @
except for the first one in the string:
import re
s = "Za @Foo_Bar: @BAR_foo @FooBAR @BArfoo"
print(re.split('(?<!\AZa):? @', s))
Result
['Za @Foo_Bar', 'BAR_foo', 'FooBAR', 'BArfoo']
Upvotes: 1
Reputation: 91508
Don't use groups:
import re
s = "Za @Foo_Bar: @BAR_foo @FooBAR @BArfoo"
g = re.findall(r'(?:Za\s)@\w+|(?<=@)\w+', s)
print(g)
Output:
['Za @Foo_Bar', 'BAR_foo', 'FooBAR', 'BArfoo']
Explanation:
(?:Za\s) # non capture group
@ # @
\w+ # 1 or more word character
| #
(?<=@) # lookbehind, make sure we have @ before
\w+ # 1 or more word character
Upvotes: 1
Reputation: 1124070
regex.fullmatch()
is the wrong method to use here, I don't think you understood what it is useful for.
From the regex
module documentation:
fullmatch
behaves likematch
, except that it must match all of the string.
You pattern doesn't match all of the input string. Only if the pattern covers everything, from the first character to the last, will fullmatch()
return a match.
Where re.match()
only matches when at the start of the string, as if you added \A
to the start of your pattern, regex.fullmatch()
matches as if you added \A
to the start, and \Z
to the end of your pattern.
Note that you don't need the |(@[A-Za-z0-9_]*)
option; that pattern is fully covered by the (Za\s)?@[A-Za-z0-9_]*
part already, when (Za\s)?
doesn't match.
To get a clean list, use re.findall()
but use a (?:...)
non-capturing group to cover the optional part ,so you don't get separate strings in the re.findall()
result:
>>> import re
>>> a = "Za @Foo_Bar: @BAR_foo @FooBAR @BArfoo"
>>> re.findall(r'(?:Za\s)?@[A-Za-z0-9_]*', a)
['Za @Foo_Bar', '@BAR_foo', '@FooBAR', '@BArfoo']
With no capturing groups, the whole match is returned.
Upvotes: 1