Reputation: 639
I have a string with the following format:
author="PersonsName" date="1183050420" format="1.1" version="1.2"
I want to turn it in to a Python dict, a la:
{'author': 'PersonsName', 'date': '1183050420', 'format': '1.1', 'version': '1.2'}
I have tried to do so using re.split on the string as so:
attribs = (re.split('(=?" ?)', twikiattribs))
thinking I would get a list back like:
['author', 'PersonsName', 'date', '1183050420', 'format', '1.1', 'version', '1.2']
that then I could turn into a dict, but instead I'm getting:
['author', '="', 'PersonsName', '" ', 'date', '="', '1183050420', '" ', 'format', '="', '1.1', '" ', 'version', '="', '1.2', '"', '']
So, before I follow the re.split line further, is there generally a better way to achieve what I'm trying to do, and/or if the solution involves re.split, how can I write a regex that will split on any of the strings ="
, "_
(where "_" is a space char) or just "
to just yield a list with the keys in the odd indices, and values in the even?
Upvotes: 3
Views: 3785
Reputation: 129
This might help some other people that re.findall() doesn't.
# grabbing input
input1 = dict,list,ect
# creating a phantom variable
Phantom = 'variable_name = ' + input1
# executing the phantom
phenomenon = exec(Phantom)
# storing the phantom variable in a live one
output = variable_name
# printing the stored phantom variable
print(output)
What it essentially does is adds a variable name to your input and creates that variable.
For example, if your list returns as "[[1,2][list][3,4]]" this executes as variable_name = [[1,2][list][3,4]]
In which activates it's original function.
It does create a PEP 8 error since the variable doesn't exist until it runs.
Upvotes: 0
Reputation: 2130
A non-regex list comprehension one liner:
>>> s = 'author="PersonsName" date="1183050420" format="1.1" version="1.2"'
>>> print dict([tuple(x.split('=')) for x in s.split()])
{'date': '"1183050420"', 'format': '"1.1"', 'version': '"1.2"', 'author': '"PersonsName"'}
Upvotes: 3
Reputation: 140
Try
>>> str = 'author="PersonsName" date="1183050420" format="1.1" version="1.2"'
>>> eval ('dict(' + str.replace(" ",",") + ')')
{'date': '1183050420', 'format': '1.1', 'version': '1.2', 'author': 'PersonsName'}
assuming as earlier the values have no space in them.
Beware of using eval()
though. Bad things may happen for funny input. Don't use it on user input.
Upvotes: 0
Reputation: 4951
you can also do it without re
in one line:
>>> data = '''author="PersonsName" date="1183050420" format="1.1" version="1.2"'''
>>> {k:v.strip('"') for k,v in [i.split("=",1) for i in data.split(" ")]}
{'date': '1183050420', 'format': '1.1', 'version': '1.2', 'author': 'PersonsName'}
if whitespaces are allowed inside the values you can use this line:
>>> {k:v.strip('"') for k,v in [i.split("=",1) for i in data.split('" ')]}
Upvotes: 4
Reputation: 25954
The way I'd personally parse it:
import shlex
s = 'author="PersonsName" date="1183050420" format="1.1" version="1.2"'
dict(x.split('=') for x in shlex.split(s))
Out[12]:
{'author': 'PersonsName',
'date': '1183050420',
'format': '1.1',
'version': '1.2'}
Upvotes: 3
Reputation: 7146
The problem is that you included parenthesis in your regex, which turns it into a captured group and includes it in the split. Assign attribs
like this
attribs = (re.split('=?" ?', twikiattribs))
and it will work as expected. This does return a blank string (due to the final "
in your input string), so you'll want to use attribs[:-1]
when creating the dictionary.
Upvotes: 1
Reputation: 1121744
Use re.findall()
:
dict(re.findall(r'(\w+)="([^"]+)"', twikiattribs))
re.findall()
, when presented with a pattern with multiple capturing groups, returns a list of tuples, each nested tuple containing the captured groups. dict()
happily takes that output and interprets each nested tuple as a key-value pair.
Demo:
>>> import re
>>> twikiattribs = 'author="PersonsName" date="1183050420" format="1.1" version="1.2"'
>>> re.findall(r'(\w+)="([^"]+)"', twikiattribs)
[('author', 'PersonsName'), ('date', '1183050420'), ('format', '1.1'), ('version', '1.2')]
>>> dict(re.findall(r'(\w+)="([^"]+)"', twikiattribs))
{'date': '1183050420', 'format': '1.1', 'version': '1.2', 'author': 'PersonsName'}
re.split()
also behaves differently based on capturing groups; the text on which you split is included in the output if grouped. Compare the output with and without the capturing group:
>>> re.split('(=?" ?)', twikiattribs)
['author', '="', 'PersonsName', '" ', 'date', '="', '1183050420', '" ', 'format', '="', '1.1', '" ', 'version', '="', '1.2', '"', '']
>>> re.split('=?" ?', twikiattribs)
['author', 'PersonsName', 'date', '1183050420', 'format', '1.1', 'version', '1.2', '']
The re.findall()
output is far easier to convert to a dictionary however.
Upvotes: 5