user2480542
user2480542

Reputation: 2945

Convert Python String to Dictionary

I have a string:

A = "{user_id:34dd833,category:secondary,items:camera,type:sg_ser}"

I need to convert it to python dictionary, so that:

A = {"user_id":"34dd833", "category": "secondary", "items": "camera", "type": "sg_ser"}

On top of that, there are two more issues:

1: the "items" key is supposed to have multiple values, like:

A = {"user_id":34dd833, "category": "secondary", "items": "camera,vcr,dvd", "type": "sg_ser"}

Which apparently comes into the form of a string as:

A = "{user_id:34dd833,category:secondary,items:camera,vcr,dvd,type:sg_ser}"

So, generalizing anything based on comma separation becomes useless.

2: The order of the string can be random as well. So, the string can be like this as well:

A = "{category:secondary,type:sg_ser,user_id:34dd833,items:camera,vcr,dvd}"

Which makes any the process of assuming thins by order as a false one.

What to do in such a situation? Many thanks.

Upvotes: 0

Views: 183

Answers (2)

abarnert
abarnert

Reputation: 366053

If we can assume that your input doesn't do any quoting or escaping (your example doesn't, but that doesn't necessarily mean it's a good assumption), and that you can never have comma-separated multiple keys, just multiple values (which probably is a good assumption, because otherwise the format is ambiguous…):

First, let's drop the braces, then split on colons:

>>> A = "{user_id:34dd833,category:secondary,items:camera,vcr,dvd,type:sg_ser}"
>>> A[1:-1].split(':')
['user_id', '34dd833,category', 'secondary,items', 'camera,vcr,dvd,type', 'sg_ser']

So the first entry is the first key, the last entry is the last value(s), and every entry in between is the Nth value(s) followed by a comma followed by the N+1th key. There may be other commas there, but the last one always splits the Nth value(s) from the N+1th key. (And that even works for N=0—there are no commas, so the last comma splits nothing from the 0th key. But it doesn't work for the very last entry, unfortunately. I'll get to that later.)

There are ways we could make this brief, but let's write that out explicitly as code first, so you understand how it works.

>>> d = {}
>>> entries = A[1:-1].split(':')
>>> for i in range(len(entries)-1):
...     key = entries[i].rpartition(',')[-1]
...     value = entries[i+1].rpartition(',')[0]
...     d[key] = value

This is almost right:

>>> d
{'category': 'secondary', 'items': 'camera,vcr,dvd', 'type': '', 'user_id': '34dd833'}

As mentioned above, it doesn't work for the last one. It should be obvious why; if not, see what rpartition(',') returns for the last value. You can patch that up manually, or just cheat by packing an extra , on the end (entries = (A[1:-1] + ',').split(':')). But if you think about it, if you just rsplit instead of rpartition, then [0] does the right thing. So let's do that instead.

So, how can we clean this up a bit?

First let's transform entries into a list of adjacent pairs. Now, each for each pair (n, nplus1), n.rpartition(',')[-1] is the key, and nplus1.rsplit(',', 1)[0] is the corresponding value. So:

>>> A = "{user_id:34dd833,category:secondary,items:camera,vcr,dvd,type:sg_ser}"
>>> entries = A[1:-1].split(':')
>>> adjpairs = zip(entries, entries[1:])
>>> d = {k.rpartition(',')[-1]: v.rsplit(',', 1)[0] for k, v in adjpairs}

Upvotes: 6

Jon Clements
Jon Clements

Reputation: 142216

Here's another way (not particularly robust, but shows it's possible on the sample data):

import re
text = "{user_id:34dd833,category:secondary,items:camera,vcr,dvd,type:sg_ser}"
print dict(re.findall(r'(\w+):(.*?)(?=(?:,\w+:)|$)', text.strip('{}')))
# {'category': 'secondary', 'items': 'camera,vcr,dvd', 'user_id': '34dd833', 'type': 'sg_ser'}

Upvotes: 2

Related Questions