user3556757
user3556757

Reputation: 3619

python regular expression grouping

My regular expression goal:

"If the sentence has a '#' in it, group all the stuff to the left of the '#' and group all the stuff to the right of the '#'. If the character doesn't have a '#', then just return the entire sentence as one group"

Examples of the two cases:

A) '120x4#Words' -> ('120x4', 'Words')
B) '[email protected]' -> ('[email protected]')

I made a regular expression that parses case A correctly

(.*)(?:#(.*))

# List the groups found
>>> r.groups()
(u'120x4', u'words')

But of course this won't work for case B -- I need to make "# and everything to the right of it" optional

So I tried to use the '?' "zero or none" operator on that second grouping to indicate it's optional.
(.*)(?:#(.*))?

But it gives me bad results. The first grouping eats up the entire string.

# List the groups found
>>> r.groups()
(u'120x4#words', None)

Guess I'm either misunderstanding the none-or-one '?' operator and how it works on groupings or I am misunderstanding how the first group is acting greedy and grabbing the entire string. I did try to make the first group 'reluctant', but that gave me a total no-match.

(.*?)(?:#(.*))?


# List the groups found
>>> r.groups()
(u'', None)

Upvotes: 0

Views: 309

Answers (4)

Peter Sutton
Peter Sutton

Reputation: 1223

Here's a verbose re solution. But, you're better off using str.split.

import re

REGEX = re.compile(r'''
    \A
    (?P<left>.*?)
    (?:
        [#]
        (?P<right>.*)
    )?
    \Z
''', re.VERBOSE)


def parse(text):
    match = REGEX.match(text)
    if match:
        return tuple(filter(None, match.groups()))

print(parse('120x4#Words'))
print(parse('[email protected]'))

Better solution

def parse(text):
    return text.split('#', maxsplit=1)

print(parse('120x4#Words'))
print(parse('[email protected]'))

Upvotes: 1

hjpotter92
hjpotter92

Reputation: 80653

Simply use the standard str.split function:

s = '120x4#Words'
x = s.split( '#' )

If you still want a regex solution, use the following pattern:

([^#]+)(?:#(.*))?

Upvotes: 3

vks
vks

Reputation: 67998

(.*?)#(.*)|(.+)

this sjould work.See demo.

http://regex101.com/r/oC3nN4/14

Upvotes: 1

Kasravnd
Kasravnd

Reputation: 107357

use re.split :

>>> import re
>>> a='120x4#Words'
>>> re.split('#',a)
['120x4', 'Words']
>>> b='[email protected]'
>>> re.split('#',b)
['[email protected]']
>>> 

Upvotes: 1

Related Questions