Will Dennis
Will Dennis

Reputation: 639

turn key="value" string into a dict

I have a string with the following format:

author="PersonsName" date="1183050420" format="1.1" version="1.2"

I want to turn it in to a Python dict, a la:

{'author': 'PersonsName', 'date': '1183050420', 'format': '1.1', 'version': '1.2'}

I have tried to do so using re.split on the string as so:

attribs = (re.split('(=?" ?)', twikiattribs))

thinking I would get a list back like:

['author', 'PersonsName', 'date', '1183050420', 'format', '1.1', 'version', '1.2']

that then I could turn into a dict, but instead I'm getting:

['author', '="', 'PersonsName', '" ', 'date', '="', '1183050420', '" ', 'format', '="', '1.1', '" ', 'version', '="', '1.2', '"', '']

So, before I follow the re.split line further, is there generally a better way to achieve what I'm trying to do, and/or if the solution involves re.split, how can I write a regex that will split on any of the strings =", "_ (where "_" is a space char) or just " to just yield a list with the keys in the odd indices, and values in the even?

Upvotes: 3

Views: 3785

Answers (7)

Aze
Aze

Reputation: 129

This might help some other people that re.findall() doesn't.

# grabbing input
input1 = dict,list,ect

# creating a phantom variable
Phantom = 'variable_name =  ' + input1

# executing the phantom
phenomenon = exec(Phantom)

# storing the phantom variable in a live one
output = variable_name

# printing the stored phantom variable
print(output)

What it essentially does is adds a variable name to your input and creates that variable.

For example, if your list returns as "[[1,2][list][3,4]]" this executes as variable_name = [[1,2][list][3,4]]

In which activates it's original function.

It does create a PEP 8 error since the variable doesn't exist until it runs.

Upvotes: 0

Saksham Varma
Saksham Varma

Reputation: 2130

A non-regex list comprehension one liner:

>>> s = 'author="PersonsName" date="1183050420" format="1.1" version="1.2"'

>>> print dict([tuple(x.split('=')) for x in s.split()])
{'date': '"1183050420"', 'format': '"1.1"', 'version': '"1.2"', 'author': '"PersonsName"'}

Upvotes: 3

micce
micce

Reputation: 140

Try

>>> str = 'author="PersonsName" date="1183050420" format="1.1" version="1.2"'
>>> eval ('dict(' + str.replace(" ",",") + ')')
{'date': '1183050420', 'format': '1.1', 'version': '1.2', 'author': 'PersonsName'}

assuming as earlier the values have no space in them.

Beware of using eval() though. Bad things may happen for funny input. Don't use it on user input.

Upvotes: 0

Elisha
Elisha

Reputation: 4951

you can also do it without re in one line:

>>> data = '''author="PersonsName" date="1183050420" format="1.1" version="1.2"'''
>>> {k:v.strip('"') for k,v in [i.split("=",1) for i in data.split(" ")]}
{'date': '1183050420', 'format': '1.1', 'version': '1.2', 'author': 'PersonsName'}

if whitespaces are allowed inside the values you can use this line:

>>> {k:v.strip('"') for k,v in [i.split("=",1) for i in data.split('" ')]}

Upvotes: 4

roippi
roippi

Reputation: 25954

The way I'd personally parse it:

import shlex

s = 'author="PersonsName" date="1183050420" format="1.1" version="1.2"'

dict(x.split('=') for x in shlex.split(s))
Out[12]: 
{'author': 'PersonsName',
 'date': '1183050420',
 'format': '1.1',
 'version': '1.2'}

Upvotes: 3

Rob Watts
Rob Watts

Reputation: 7146

The problem is that you included parenthesis in your regex, which turns it into a captured group and includes it in the split. Assign attribs like this

attribs = (re.split('=?" ?', twikiattribs))

and it will work as expected. This does return a blank string (due to the final " in your input string), so you'll want to use attribs[:-1] when creating the dictionary.

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1121744

Use re.findall():

dict(re.findall(r'(\w+)="([^"]+)"', twikiattribs))

re.findall(), when presented with a pattern with multiple capturing groups, returns a list of tuples, each nested tuple containing the captured groups. dict() happily takes that output and interprets each nested tuple as a key-value pair.

Demo:

>>> import re
>>> twikiattribs = 'author="PersonsName" date="1183050420" format="1.1" version="1.2"'
>>> re.findall(r'(\w+)="([^"]+)"', twikiattribs)
[('author', 'PersonsName'), ('date', '1183050420'), ('format', '1.1'), ('version', '1.2')]
>>> dict(re.findall(r'(\w+)="([^"]+)"', twikiattribs))
{'date': '1183050420', 'format': '1.1', 'version': '1.2', 'author': 'PersonsName'}

re.split() also behaves differently based on capturing groups; the text on which you split is included in the output if grouped. Compare the output with and without the capturing group:

>>> re.split('(=?" ?)', twikiattribs)
['author', '="', 'PersonsName', '" ', 'date', '="', '1183050420', '" ', 'format', '="', '1.1', '" ', 'version', '="', '1.2', '"', '']
>>> re.split('=?" ?', twikiattribs)
['author', 'PersonsName', 'date', '1183050420', 'format', '1.1', 'version', '1.2', '']

The re.findall() output is far easier to convert to a dictionary however.

Upvotes: 5

Related Questions