Nitin Kr
Nitin Kr

Reputation: 531

pyparsing scanString with spaces not able to parse

I am using below regex expression (with pyparsing), which doesn't give any output. Any idea what I am doing wrong here.

>>> pat = pp.Regex('\s+\w+')    
>>> x = " ***    abc   xyz   pqr"
>>> for result, start, end in pat.scanString(x):
    print result, start, end

if \s is removed. We get the data

>>> pat = pp.Regex('\w+')   
>>> x = " ***    abc   xyz   pqr"
>>> for result, start, end in pat.scanString(x):
    print result, start, end

['abc'] 8 11
['xyz'] 14 17
['pqr'] 20 23

Upvotes: 1

Views: 126

Answers (1)

Corentin Limier
Corentin Limier

Reputation: 5006

According to this, whitespaces are skipped by default in pyparsing.

During the matching process, whitespace between tokens is skipped by default (although this can be changed).

But Regex class inherits from ParserElement which has a leaveWhitespace() method.

leaveWhitespace(self) source code

Disables the skipping of whitespace before matching the characters in the ParserElement's defined pattern. This is normally only used internally by the pyparsing module, but may be needed in some whitespace-sensitive grammars.

So this code works :

>>> pat = pp.Regex('\s+\w+')
>>> pat.leaveWhitespace()
>>> x = " ***    abc   xyz   pqr"
>>> for result, start, end in pat.scanString(x):
        print result, start, end

['    abc'] 4 11
['   xyz'] 11 17
['   pqr'] 17 23

Upvotes: 3

Related Questions