Reputation: 555
I need to parse this string, with only one regular expression in Python. For every group I need to save the value in a specific field. The problem is that one or more of the parameters may be missing or be in a different order. (i.e. domain 66666 ip nonce
, with the middle part missing)
3249dsf 2013-02-10T06:44:30.666821+00:00 domain constant 66666 sync:[127.0.0.1] Request: pubvalue=kjiduensofksidoposiw&change=09872534&value2=jdmcnhj&counter=232&value3=2&nonce=7896089hujoiuhiuh098h
I need to assign:
time=2013-02-10T06:45:30.666821+00:00
(constant format)domain=domain
(string)code=66666
(integer)ip=127.0.0.1
(string)pubvalue=kjiduensofksidoposiw
(string of fixed length) nonce=7896089hujoiuhiuh098h
(string)EDIT
This is an example on how the string can vary: 123dsf 2014-01-11T06:49:30.666821+00:00 google constant 12356 sync:[192.168.0.1] Request: pubvalue=fggggggeesidoposiw&nonce=7896089hujoiuhiuh098h
Thank you in advance for showing me the way.
Upvotes: 1
Views: 150
Reputation: 2790
It's probably not a good idea to use one regex to parse the whole string.
but I think the solution is to use named groups
(see: Named groups on Regex Tutorial.
Named groups
can be captured by (?P<nameofgroup>bla)
So you can match for example the ip with:
import re
str = "3249dsf 2013-02-10T06:44:30.666821+00:00 domain constant 66666 sync:[127.0.0.1] Request: pubvalue=kjiduensofksidoposiw&change=09872534&value2=jdmcnhj&counter=232&value3=2&nonce=7896089hujoiuhiuh098h"
print re.search("\[(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]", str).groupdict()
Just extend this Regular expression with the patterns you need to match the other stuff.
and you can make the groups optional with placing a ?
after the group's parantheses, like: (?P<ip>pattern)?
. If a pattern could not be matched the element in the dict will be None
.
But notice: It is not a good idea to do this in only one Regex. It will be slow (because of backtracking and stuff) and the Regex will be long and complicated to maintain!
Upvotes: 1