How can i read a python dict contained in a csv file and store the data in a pandas dataframe?

Question

I have a csv where each row is a dictionary, inside each row's dict, there is a list, and this list contains a sublist and a subdict. Each sublist has 2 elements, and the subdict has 100 keys, and a value per key. This is a screenshot of the data:

Here's a sample of the data in text format:

{"0": [[10.8, 36.0], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"1": [[10.8, 36.1], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"2": [[10.8, 36.2], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"3": [[10.8, 36.300000000000004], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"4": [[10.8, 36.4], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"5": [[10.8, 36.5], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"6": [[10.8, 36.6], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"7": [[10.8, 36.7], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"8": [[10.8, 36.800000000000004], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0}]}
{"9": [[10.8, 36.9], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0}]}

What i would like to do is to read this into a pandas dataframe that would produce an expected output like this (i will just type a single row for the sake of simplicity):

list_elemnt_1   list_elemnt_2  key_0,  key_1,  key_2,  key_3,  key_4,  and so on...
        value           value  value   value   value   value   value   and so on...

For each row in the csv, i would like to build a dataframe with one column per sublist value (2), and one column for each key in the subdict contained on the row's dict.

How could i do this?? Please feel free to ask more information if needed.

Thank you very much in advance

EDIT

Key_0, key_1, key_2, etc... are the subdict keys, not the master dict keys

zero · Accepted Answer

import ast
import pandas as pd

file = open('file_55966371.csv', 'r')

lines = [ast.literal_eval(line) for line in file]

def clean_lines(line):
    value = [v for v in line.values()]

    l1, l2 = value[0][0]

    line_dict = value[0][1]

    line_dict = {f'key_{key}': value for key, value in line_dict.items()}

    line_dict['list_element1'] = l1
    line_dict['list_element2'] = l2

    return line_dict

to_read = [clean_lines(line) for line in lines]

df = pd.DataFrame(to_read)

I agree with @furas this looks a lot like a JSON, and if this data was sourced from someone, it would be best if you could ask them if they could send it to you in JSON format.

If not, the code above works.

Open the file.
read each line and store it as a list. ast.literal_eval allows Python to recognize that it's a dictionary from the get-go and stores them as dict objects.
i created a helper clean_lines function. clean_lines is the more important part.
1. get the values (i.e. the list with a sublist and a subdict)
2. unpack the list into two variables l1 and l2
3. rename the key for the subdict (to your specs of key_X) d. add l1 and l2 as entries into the dictionary, basically combining the sublist and subdict into a single dictionary

Once you have a list of dictionaries, pandas will be able to recognize it and you can plug it into a pd.DataFrame instantiator object

How can i read a python dict contained in a csv file and store the data in a pandas dataframe?

Answers (2)

Related Questions