xyliu00
xyliu00

Reputation: 746

use python regex to find key=value pairs separated by comma but keep quoted portion together

How do you get the key, value pair from following string:

s='pairs=<A=name,B=2,C="Last, First">'

the part within the brackets < ... > contains K=V pairs, which are separated by commas. The value part could have commas in quotes.

My clunky solution is to get the portion between bracket, find positions of comma that is not in quotes, substring by found comma positions, then split key / value by '=', and turn the whole thing to a dict.

regex should be able to do this in much straightforward way, right?

I got to this far:

re.findall('([A-Z]+[0-9]*)=([^,]*)', s)

but result did not turn out to be right:

[('A', 'name'), ('B', '2'), ('C', '"Last')]

How to ignore the commas in the quotes?

EDIT

I made a mistake. My original regex pattern should not have worked, if the key can have lower case letter.

I took that into consideration, and combined and eph and vks' solutions:

s='Pairs=<Aa=name,Bb=2,Cc="Last, First">'
re.findall('([A-Za-z]+[0-9]*)=("[^"]*"|[^,]*)', re.findall(r"<([^>]*)>",s)[0]) 

and it seems to work.

Improvement upon this solution?

Upvotes: 2

Views: 1538

Answers (2)

vks
vks

Reputation: 67978

The correct way would be to extract all between <> and then split by , not in quotes.

s='pairs=<A=name,B=2,C="Last, First">'
print re.split(r',(?=(?:[^"]*"[^"]*")*[^"]*$)',re.findall(r"<([^>]*)>",s)[0])

Output:['A=name', 'B=2', 'C="Last, First"']

Not you can easily make pairs by splitting on =.

print map(lambda x:x.split("="),z)

Output:[['A', 'name'], ['B', '2'], ['C', '"Last, First"']]

Upvotes: 1

eph
eph

Reputation: 2028

re.findall('([A-Z]+[0-9]*)=("[^"]*"|[^,]*)', s)

Upvotes: 1

Related Questions