rel.foo.fighters
rel.foo.fighters

Reputation: 402

Useful way to convert string to dictionary using python

I have the below string as input:

'name SP2, status Online, size 4764771 MB, free 2576353 MB, path /dev/sde, log 210 MB, port 5660, guid 7478a0141b7b9b0d005b30b0e60f3c4d, clusterUuid -8650609094877646407--116798096584060989, disks /dev/sde /dev/sdf /dev/sdg, dare 0'

I wrote function which convert it to dictionary using python:

def str_2_json(string):
    str_arr = string.split(',')
    #str_arr{0} = name SP2
    #str_arr{1} = status Online
    json_data = {}
    for i in str_arr:
        #remove whitespaces
        stripped_str = " ".join(i.split())  # i.strip()
        subarray = stripped_str.split(' ')
        #subarray{0}=name
        #subarray{1}=SP2
        key = subarray[0] #key: 'name'
        value = subarray[1] #value: 'SP2'
        json_data[key] = value
        #{dict 0}='name': SP2'
        #{dict 1}='status': online'
    return json_data

The return turns the dictionary into json (it has jsonfiy).

Is there a simple/elegant way to do it better?

Upvotes: 1

Views: 446

Answers (4)

Saint
Saint

Reputation: 406

You can do this with regex

import re

def parseString(s):
    dict(re.findall('(?:(\S+) ([^,]+)(?:, )?)', s))

sample = "name SP1, status Offline, size 4764771 MB, free 2406182 MB, path /dev/sdb, log 230 MB, port 5660, guid a48134c00cda2c37005b30b0e40e3ed6, clusterUuid -8650609094877646407--116798096584060989, disks /dev/sdb /dev/sdc /dev/sdd, dare 0"

parseString(sample)

Output:

{'name': 'SP1',
 'status': 'Offline',
 'size': '4764771 MB',
 'free': '2406182 MB',
 'path': '/dev/sdb',
 'log': '230 MB',
 'port': '5660',
 'guid': 'a48134c00cda2c37005b30b0e40e3ed6',
 'clusterUuid': '-8650609094877646407--116798096584060989',
 'disks': '/dev/sdb /dev/sdc /dev/sdd',
 'dare': '0'}

Upvotes: 2

tdelaney
tdelaney

Reputation: 77337

Assuming these fields cannot contain internal commas, you can use re.split to both split and remove surrounding whitespace. It looks like you have different types of fields that should be handled differently. I've added a guess at a schema handler based on field names that can serve as a template for converting the various fields as needed.

And as noted elsewhere, there is no json so don't use that name.

import re

test = 'name SP2, status Online, size 4764771 MB, free 2576353 MB, path /dev/sde, log 210 MB, port 5660, guid 7478a0141b7b9b0d005b30b0e60f3c4d, clusterUuid -8650609094877646407--116798096584060989, disks /dev/sde /dev/sdf /dev/sdg, dare 0'

def decode_data(string):
    str_arr = re.split(r"\s*,\s*", string)
    data = {}
    for entry in str_arr:
        values = re.split(r"\s+", entry)
        key = values.pop(0)
        # schema processing
        if key in ("disks"): # multivalue keys
            data[key] = values
        elif key in ("size", "free"): # convert to int bytes on 2nd value
            multiplier = {"MB":10**6, "MiB":2**20} # todo: expand as needed
            data[key] = int(values[0]) * multiplier[values[1]]
        else:
            data[key] = " ".join(values)
    return data

decoded = decode_data(test)
for kv in sorted(decoded.items()):
    print(kv)

Upvotes: 1

pho
pho

Reputation: 25489

Your approach is good, except for a couple weird things:

  • You aren't creating a JSON anything, so to avoid any confusion I suggest you don't name your returned dictionary json_data or your function str_2_json. JSON, or JavaScript Object Notation is just that -- a standard of denoting an object as text. The objects themselves have nothing to do with JSON.
  • You can use i.strip() instead of joining the splitted string (not sure why you did it this way, since you commented out i.strip())
  • Some of your values contain multiple spaces (e.g. "size 4764771 MB" or "disks /dev/sde /dev/sdf /dev/sdg"). By your code, you end up everything after the second space in such strings. To avoid this, do stripped_str.split(' ', 1) which limits how many times you want to split the string.

Other than that, you could create a dictionary in one line using the dict() constructor and a generator expression:

def str_2_dict(string):
    data = dict(item.strip().split(' ', 1) for item in string.split(','))
    return data

print(str_2_dict('name SP2, status Online, size 4764771 MB, free 2576353 MB, path /dev/sde, log 210 MB, port 5660, guid 7478a0141b7b9b0d005b30b0e60f3c4d, clusterUuid -8650609094877646407--116798096584060989, disks /dev/sde /dev/sdf /dev/sdg, dare 0'))

Outputs:

{
 'name': 'SP2',
 'status': 'Online',
 'size': '4764771 MB',
 'free': '2576353 MB',
 'path': '/dev/sde',
 'log': '210 MB',
 'port': '5660',
 'guid': '7478a0141b7b9b0d005b30b0e60f3c4d',
 'clusterUuid': '-8650609094877646407--116798096584060989',
 'disks': '/dev/sde /dev/sdf /dev/sdg',
 'dare': '0'
}

This is probably the same (practically, in terms of efficiency / time) as writing out the full loop:

def str_2_dict(string):
    data = dict()
    for item in string.split(','):
        key, value = item.strip().split(' ', 1) 
        data[key] = value
    return data

Upvotes: 1

Cristian
Cristian

Reputation: 200080

import json

json_data = json.loads(string)

Upvotes: 0

Related Questions