ggupta
ggupta

Reputation: 59

parse a list looking string having dict type elements in python

I want to parse the below list looking string, ( calling it string because its type is str ) and get some info from its dict elements:

 "[{""isin"": ""US51817R1068"", ""name"": ""LATAM Airlines Group SA""}, {""isin"": ""CL0000000423"", ""name"": ""LATAM Airlines Group SA""}, {""isin"": null, ""name"": ""LATAM Airlines Group SA""}, {""isin"": ""BRLATMBDR001"", ""name"": ""LATAM Airlines Group SA""}]"

i used ast packege and literal_eval to convert into a list and parse over it. but counter with ValueError: malformed string ERROR.

Below is the code for the same:

company_list = ast.literal_eval(line[18])
print company_list
for i in company_list:
    #print type(i)
    print i["isin"]

here line[18] is the string above.

or how can i ignore such list lookign string if it contains any null value, like it does.

PS: line[18] is the column number of csv which i want read.

Upvotes: 0

Views: 59

Answers (1)

Darkstarone
Darkstarone

Reputation: 4730

Ok just going start off by saying: wow that way harder than I thought it was going to be!

So two problems with the string:

  1. When python prints the string it removes all double-quotes because the parser is getting confused - so we have to add them back in.
  2. The null type doesn't exist in Python so we need to change that to None.

So here's the code:

import re
import ast

data_in = "[{""isin"": ""US51817R1068"", ""name"": ""LATAM Airlines Group SA""}, {""isin"": ""CL0000000423"", ""name"": ""LATAM Airlines Group SA""}, {""isin"": null, ""name"": ""LATAM Airlines Group SA""}, {""isin"": ""BRLATMBDR001"", ""name"": ""LATAM Airlines Group SA""}]"

# Make a copy for modification.
formatted_data = data_in

# Captures the positional information of adding and removing characters.
offset = 0

# Finds all key and values.
p = re.compile("[\{\:,]([\w\s\d]{2,})")
for m in p.finditer(data_in):
    # Counts the number of characters removed via strip().
    strip_val = len(m.group(1)) - len(m.group(1).strip())
    # Adds in quotes for a single match.
    formatted_data = formatted_data[:m.start(1)+offset] + "\"" + m.group(1).strip() + "\"" + formatted_data[m.end(1)+offset:]
    # Offset will always add 2 ("+name+"), minus whitespace removed. 
    offset += 2 - strip_val

company_list = ast.literal_eval(formatted_data)

# Finds 'null' values and replaces them with None.
for item in company_list:
    for k,v in item.iteritems():
        if v == 'null':
            item[k] = None

print company_list

It was written in Python 3 and I changed the bits I remembered back to 2, there might be small errors.

The result is a list of dict objects:

[{'isin': 'US51817R1068', 'name': 'LATAM Airlines Group SA'}, {'isin': 'CL0000000423', 'name': 'LATAM Airlines Group SA'}, {'isin': None, 'name': 'LATAM Airlines Group SA'}, {'isin': 'BRLATMBDR001', 'name': 'LATAM Airlines Group SA'}]

For more info on the regex used, see here.

Upvotes: 1

Related Questions