Reputation: 612
I'm trying to parse this list:
d0,d1,d2,d3,....d456,d457....
To parse this in python-ply, I wrote this as expression :
t_DID = r'[d][0-9]+'
t_DID = r'd[0-9]+'
t_DID = r'\d[0-9]+'
But, it provides me error.
When, I enter 1, it gives me - DEBUG:root:Syntax error at '1'
And when I enter d, it gives me - DEBUG:root:Syntax error at 'd'
What would be the correct token, for this pattern?
How can I resolve this ?
Upvotes: 0
Views: 134
Reputation: 241741
None of those patterns match either d
or 1
.
r'[d][0-9]+'
and r'd[0-9]+'
match a d
followed by at least one digit. So they will match d1
or d234
, but they won't match d
because it is not followed by a digit, and they will not match 1
because it doesn't start with d
r'\d[0-9]+'
matches a digit (\d
) followed by at least one digit more. So it won't match any string starting with d
, and it won't match 1
because it requires at least two digits. But it will match 12
, 274
and 29847502948375029384750293485702938750493875
.
You can read about Python regular expressions in the Python docs (The \
escape codes, including \d
, are here).
It's easy to build an interactive tool which lets you experiment with Python regular expressions. Here's a very simple example, which could be improved a lot:
$ python3
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> import readline
>>> def try_regex(regex):
... r = re.compile(regex)
... try:
... while True:
... match = r.match(input('--> '))
... if match:
... print(f"Matched {match.end()} characters: {match[0]}")
... else:
... print("No match")
... except EOFError:
... pass
...
>>> try_regex(r'd[0-9]+')
--> d1
Matched 2 characters: d1
--> d123
Matched 4 characters: d123
--> 1
No match
--> d
No match
--> d123 abc
Matched 4 characters: d123
--> d123abc
Matched 4 characters: d123
Upvotes: 1