Reputation: 160

Getting rid of an optional space in Python regex script

I'm having a bit of an issue with my regex script and hopefully somebody can help me out.

Basically, I have a regex script that I use re.findall() with in a python script. My goal is to search various strings of varying length for references to Bible verses (e.g. John 3:16, Romans 6, etc). My regex script mostly works, but it sometimes tacks on an extra whitespace before the Bible book name. Here's the script:

versesToFind = re.findall(r'\d?\s?\w+\s\d+:?\d*', str)

To hopefully explain this problem better, here's my results when running this script on this text string:

str = 'testing testing John 3:16 adsfbaf John 2 1 Kings 4 Romans 4'

Result (from www.pythonregex.com):

[u' John 3:16', u' John 2', u'1 Kings 4', u' Romans 4']

As you can see, John 2 and Romans 4 has an extra whitespace at the beginning that I want to get rid of. Hopefully my explanation makes sense. Thanks in advance!

Upvotes: 0

Answers (3)

TerryA

Reputation: 59974

Instead of rewriting your regular expression, you can always just strip() the whitespace:

>>> L = [u' John 3:16', u' John 2', u'1 Kings 4', u' Romans 4']
>>> print map(unicode.strip, L)
[u'John 3:16', u'John 2', u'1 Kings 4', u'Romans 4']

map() here is just identical to:

>>> print [i.strip() for i in L]
[u'John 3:16', u'John 2', u'1 Kings 4', u'Romans 4']

Upvotes: 0

Bryan

Reputation: 2078

Using list comprehension you can do it in a single line:

versesToFind = [x.strip() for x in re.findall(r'\d?\s?\w+\s\d+:?\d*', str)]

Upvotes: 0

Jared

Reputation: 26397

You can make the digit and space optional as a single unit by grouping with parens (?: just to specify it's non-capturing),

'(?:\d\s)?\w+\s\d+:?\d*'
 ^^^    ^

Which produces,

>>> s = 'testing testing John 3:16 adsfbaf John 2 1 Kings 4 Romans 4'
>>> re.findall(r'(?:\d\s)?\w+\s\d+:?\d*', s)
['John 3:16', 'John 2', '1 Kings 4', 'Romans 4']

Upvotes: 1

Getting rid of an optional space in Python regex script

Answers (3)

Related Questions