Python Regular Expression for splitting key/value pairs separated by an equal sign when a value may or may not have quotes

Question

I would like to split a string such as:

CS ID=123 HD=CT NE="HI THERE"

to a list which looks like

['CS', 'ID=123', 'HD=CT', 'NE=HI THERE']

The shlex.split() function does this but it is terribly slow. I need to find a very fast way to do this, probably using python regular expressions. Any help is appreciated. Thanks!

So I did not anticipate that the re module when used in Jython would be nearly as slow as the shlex module. Does anyone know how to do this without the re module and instead with Java regular expressions or in some other clever way?

YOU · Accepted Answer

Not sure you really want to strip quotes on HI THERE part, this one include double quotes.

>>> import re
>>> x = '''CS ID=123 HD=CT NE="HI THERE"'''
>>> re.findall("""\w+="[^"]*"|\w+='[^']*'|\w+=\w+|\w+""", x)
['CS', 'ID=123', 'HD=CT', 'NE="HI THERE"']

without quotes on HI THERE part

>>> map(''.join,re.findall("""(\w+=)"([^"]*)"|(\w+=)'([^']*)'|(\w+=\w+)|(\w+)""", x))
['CS', 'ID=123', 'HD=CT', 'NE=HI THERE']

Python Regular Expression for splitting key/value pairs separated by an equal sign when a value may or may not have quotes

Answers (2)

Related Questions