WestCoastProjects
WestCoastProjects

Reputation: 63032

Matching embedded newlines in python regex

What is the way to handle this? I have tried various permutations of strings, raw strings and (?is), re.DOTALL, but have been uniformly unsuccessful.

Following is a sampling of what I have tried:

>>> x="select a.b from a join b \nwhere a.id is not null"
>>> print (x)
select a.b from a join b 
where a.id is not null
>>> y=re.match("(?is)select (.*) from (.*) where (?P<where>.*)",x,re.DOTALL)
>>> y.groupdict()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groupdict'

Note have also tried:

    >>> x=r"""select a.b from a join b
 where a.id is not null""""

Same (incorrect result)

I have also tried with/without (?is) and re.DOTALL.

Note: if the embedded newline is removed from the tested string, then the match works perfectly:

>>> nonewline="select a.b from a join b where a.id is not null"
>>> y=re.match("(?is)select (.*) from (.*) where (?P<where>.*)",nonewline,re.DOTALL|re.MULTILINE)
>>> y.groupdict()
{'where': 'a.id is not null'}

Upvotes: 1

Views: 417

Answers (1)

aldeb
aldeb

Reputation: 6828

I think that the problem is that you actually have a newline right before the where statement, and not a space.

Your text:

"select a.b from a join b \nwhere a.id is not null"

--------------------------------------------^

Your regex:

(?is)select (.*) from (.*) where (?P<where>.*)

-------------------------------------------^

Try something like this instead:

from re import *

x = "select a.b from a join b \nwhere a.id is not null"
y = match("select\s+(.*?)\s+from\s+(.*?)\s+where\s+(?P<where>.*)",
                                                            x, DOTALL)
print(y.groups())
print(y.groupdict())

Output:

('a.b', 'a join b', 'a.id is not null')
{'where': 'a.id is not null'}

Upvotes: 2

Related Questions