martin
martin

Reputation: 885

Converting string to dataframe-readable

I have many strings like this:

"[{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]"

But since I'm working with a dataframe, I need to convert them into JSON (or that's what it looks like by the format) so I can access and flatten the data. Any idea on how this can be achieved?

EDIT: I realised that it's not JSON, but I still don't know how to convert this to a dictionary or so in order to manipulate it.

Upvotes: 0

Views: 59

Answers (3)

BenjaminK
BenjaminK

Reputation: 783

As this could be a potentially repetitive task. It's probably a good idea to make a function out of it.

import json  # Import json module to work with json data
import ast


data = "[{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]"


def clean_data_for_json_loads(input_data):
    """Prepare data from untrusted sources for json formatting. 
    Output JSON object as string """
    evaluated_data = ast.literal_eval(input_data)
    json_object_as_string = json.dumps(evaluated_data)
    return json_object_as_string

evaluated_data = clean_data_for_json_loads(data)


# Load json data from a string, the (s) in loads stands for string. This helps to remember the difference to json.load
json_data = json.loads(evaluated_data)
print(json_data)

Upvotes: -1

sarartur
sarartur

Reputation: 1228

It looks like the data is almost in JSON, but I believe in the double quotes should be around the dictionary keys, while single quotes should be around the entire object. You can fix this by running:

data_string = "[{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]"
json_string = data_string.replace("'", '''"''')

You now have a JSON string!

If you need to convert the string to python structures you can do the following:

import json

data = json.loads(json_string)
print(data[0]['id']) # 10749

Upvotes: 0

jkr
jkr

Reputation: 19300

You can use ast.literal_eval:

import ast
x = ast.literal_eval("[{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]")
x[0]["name"]  # evaluates to 'Romance'

From the documentation:

Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.

This can be used for safely evaluating strings containing Python values from untrusted sources without the need to parse the values oneself. It is not capable of evaluating arbitrarily complex expressions, for example involving operators or indexing.

Upvotes: 2

Related Questions