Danial
Danial

Reputation: 612

python ply syntax error,can't parse d[0-9]+

I'm trying to parse this list: d0,d1,d2,d3,....d456,d457....

To parse this in python-ply, I wrote this as expression :

t_DID                   =   r'[d][0-9]+'
t_DID                   =   r'd[0-9]+'
t_DID                   =   r'\d[0-9]+'

But, it provides me error.

When, I enter 1, it gives me - DEBUG:root:Syntax error at '1'

And when I enter d, it gives me - DEBUG:root:Syntax error at 'd'

What would be the correct token, for this pattern?

How can I resolve this ?

Upvotes: 0

Views: 134

Answers (1)

rici
rici

Reputation: 241741

None of those patterns match either d or 1.

  • r'[d][0-9]+' and r'd[0-9]+' match a d followed by at least one digit. So they will match d1 or d234, but they won't match d because it is not followed by a digit, and they will not match 1 because it doesn't start with d

  • r'\d[0-9]+' matches a digit (\d) followed by at least one digit more. So it won't match any string starting with d, and it won't match 1 because it requires at least two digits. But it will match 12, 274 and 29847502948375029384750293485702938750493875.

You can read about Python regular expressions in the Python docs (The \ escape codes, including \d, are here).

It's easy to build an interactive tool which lets you experiment with Python regular expressions. Here's a very simple example, which could be improved a lot:

$ python3
Python 3.6.9 (default, Nov  7 2019, 10:44:02) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> import readline
>>> def try_regex(regex):
...   r = re.compile(regex)
...   try:
...     while True:
...       match = r.match(input('--> '))
...       if match:
...         print(f"Matched {match.end()} characters: {match[0]}")
...       else:
...         print("No match")
...   except EOFError:
...     pass
... 
>>> try_regex(r'd[0-9]+')
--> d1
Matched 2 characters: d1
--> d123
Matched 4 characters: d123
--> 1
No match
--> d
No match
--> d123 abc
Matched 4 characters: d123
--> d123abc
Matched 4 characters: d123


Upvotes: 1

Related Questions