Dev
Dev

Reputation: 215

Parsing a string using regular expression?

my_string = "Value1=Product Registered;Value2=Linux;Value3=C:5;C++:5;Value4=43;"

I was using the following regex:

tokens = re.findall(r'([^;]+)=([^;]+)', line, re.I)

I need to parse value1, value2, etc and put their values into the database. For example, I need to store "C:5;C++:5" for value3 -- but by using the above regex I can only store C:5, because I parse based on ";". What would be a better way to do this?

Thanks!

Upvotes: 0

Views: 217

Answers (1)

Danica
Danica

Reputation: 28856

It seems reasonable to assume that the key names don't contain semicolons. If this isn't true, then as Philipp pointed out the language is ambiguous. But if not, you can use a lookahead to tell which ; is the separator: it has to be followed by a sequence of things that aren't either ; or =, and then either an = or end-of-string:

>>> my_string = "Value1=Product Registered;Value2=Linux;Value3=C:5;C++:5;Value4=43;"
>>> r = re.compile(r'([^;]+)=([^=]+);(?=[^;=]*(?:=|$))')
>>> r.findall(my_string)
[('Value1', 'Product Registered'),
 ('Value2', 'Linux'),
 ('Value3', 'C:5;C++:5'),
 ('Value4', '43')]

Upvotes: 3

Related Questions