malloc
malloc

Reputation: 684

How can i define a quantifier for a group of conditions in regex?

I have this string :

"Za @Foo_Bar: @BAR_foo @FooBAR @BArfoo"

And a regex pattern like this:

((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)

or

(Za\s)?@[A-Za-z0-9_]*

I want it to return this list:

['Za @Foo_Bar','BAR_foo','FooBAR','BArfoo'] 

But I'm getting unexpected results:

>>> import re
>>> import regex
>>> a = "Za @Foo_Bar: @BAR_foo @FooBAR @BArfoo"
>>> regex.fullmatch(u'((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)',a) is None
True
>>> re.findall(u'((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)',a)
[('Za @Foo_Bar', 'Za ', ''), ('@BAR_foo', '', ''), ('@FooBAR', '', ''), ('@BArfoo', '', '')]

The second result is more convincing but it contains a lot of junk values:

>>> regex.findall(u'((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)',a)
[('Za @Foo_Bar', 'Za ', ''), ('@BAR_foo', '', ''), ('@FooBAR', '', ''), ('@BArfoo', '', '')]
>>> match  = re.search(u'((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)',a)
>>> match.groups()
('Za @Foo_Bar', 'Za ', None)

Why does fullmatch return None? How can I get a clean list?

Upvotes: 0

Views: 87

Answers (3)

The fourth bird
The fourth bird

Reputation: 163517

As an alternative you might use (?<!\AZa):? @ and split on an optional colon followed by a space and an @ except for the first one in the string:

import re
s = "Za @Foo_Bar: @BAR_foo @FooBAR @BArfoo"
print(re.split('(?<!\AZa):? @', s))

Result

['Za @Foo_Bar', 'BAR_foo', 'FooBAR', 'BArfoo']

Regex demo | Python demo

Upvotes: 1

Toto
Toto

Reputation: 91508

Don't use groups:

import re

s = "Za @Foo_Bar: @BAR_foo @FooBAR @BArfoo"
g = re.findall(r'(?:Za\s)@\w+|(?<=@)\w+', s)
print(g)

Output:

['Za @Foo_Bar', 'BAR_foo', 'FooBAR', 'BArfoo']

Explanation:

  (?:Za\s)  # non capture group
  @         # @
  \w+       # 1 or more word character
|           #
  (?<=@)    # lookbehind, make sure we have @ before
  \w+       # 1 or more word character

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1124070

regex.fullmatch() is the wrong method to use here, I don't think you understood what it is useful for.

From the regex module documentation:

fullmatch behaves like match, except that it must match all of the string.

You pattern doesn't match all of the input string. Only if the pattern covers everything, from the first character to the last, will fullmatch() return a match.

Where re.match() only matches when at the start of the string, as if you added \A to the start of your pattern, regex.fullmatch() matches as if you added \A to the start, and \Z to the end of your pattern.

Note that you don't need the |(@[A-Za-z0-9_]*) option; that pattern is fully covered by the (Za\s)?@[A-Za-z0-9_]* part already, when (Za\s)? doesn't match.

To get a clean list, use re.findall() but use a (?:...) non-capturing group to cover the optional part ,so you don't get separate strings in the re.findall() result:

>>> import re
>>> a = "Za @Foo_Bar: @BAR_foo @FooBAR @BArfoo"
>>> re.findall(r'(?:Za\s)?@[A-Za-z0-9_]*', a)
['Za @Foo_Bar', '@BAR_foo', '@FooBAR', '@BArfoo']

With no capturing groups, the whole match is returned.

Upvotes: 1

Related Questions