Miguel 2488
Miguel 2488

Reputation: 1440

How can i read a python dict contained in a csv file and store the data in a pandas dataframe?

I have a csv where each row is a dictionary, inside each row's dict, there is a list, and this list contains a sublist and a subdict. Each sublist has 2 elements, and the subdict has 100 keys, and a value per key. This is a screenshot of the data:

enter image description here

Here's a sample of the data in text format:

{"0": [[10.8, 36.0], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"1": [[10.8, 36.1], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"2": [[10.8, 36.2], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"3": [[10.8, 36.300000000000004], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"4": [[10.8, 36.4], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"5": [[10.8, 36.5], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"6": [[10.8, 36.6], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"7": [[10.8, 36.7], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"8": [[10.8, 36.800000000000004], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0}]}
{"9": [[10.8, 36.9], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0}]}

What i would like to do is to read this into a pandas dataframe that would produce an expected output like this (i will just type a single row for the sake of simplicity):

list_elemnt_1   list_elemnt_2  key_0,  key_1,  key_2,  key_3,  key_4,  and so on...
        value           value  value   value   value   value   value   and so on...

For each row in the csv, i would like to build a dataframe with one column per sublist value (2), and one column for each key in the subdict contained on the row's dict.

How could i do this?? Please feel free to ask more information if needed.

Thank you very much in advance

EDIT

Key_0, key_1, key_2, etc... are the subdict keys, not the master dict keys

Upvotes: 0

Views: 1114

Answers (2)

Sid
Sid

Reputation: 4055

Not the best way to do it.

# Edit for reading the csv

# there are two ways to go about it, I am assuming data is in 1 column
df_csv = pd.read_csv('/path/to/your/file/filename.csv')


# read in the csv, I assume you are able to do this.
list_of_dfs = []
for idx, row in df_csv.iterrows():
      d = row[column_name]   # find the column name and insert here
      df = pd.DataFrame.from_dict(d,orient='index') # creating a dataframe to get the number of lines

     remove_cols = df.columns

    for i in d.keys():
         df['list_elemnt_1'] = d[i][0][0]
         df['list_elemnt_2'] = d[i][0][1]
         for key in d[i][1].keys():
               df[key] = d[i][1][key]

         # remove the original cols here
         list_of_dfs.append(df)


This will give you the dfs of each line as a line elemnt in the list_of_dfs which I assumed is the goal? Let me know if it works.

Upvotes: 1

zero
zero

Reputation: 1725

import ast
import pandas as pd

file = open('file_55966371.csv', 'r')

lines = [ast.literal_eval(line) for line in file]

def clean_lines(line):
    value = [v for v in line.values()]

    l1, l2 = value[0][0]

    line_dict = value[0][1]

    line_dict = {f'key_{key}': value for key, value in line_dict.items()}

    line_dict['list_element1'] = l1
    line_dict['list_element2'] = l2

    return line_dict

to_read = [clean_lines(line) for line in lines]

df = pd.DataFrame(to_read)

I agree with @furas this looks a lot like a JSON, and if this data was sourced from someone, it would be best if you could ask them if they could send it to you in JSON format.

If not, the code above works.

  • Open the file.

  • read each line and store it as a list. ast.literal_eval allows Python to recognize that it's a dictionary from the get-go and stores them as dict objects.

  • i created a helper clean_lines function. clean_lines is the more important part.

    1. get the values (i.e. the list with a sublist and a subdict)
    2. unpack the list into two variables l1 and l2
    3. rename the key for the subdict (to your specs of key_X) d. add l1 and l2 as entries into the dictionary, basically combining the sublist and subdict into a single dictionary

Once you have a list of dictionaries, pandas will be able to recognize it and you can plug it into a pd.DataFrame instantiator object

Upvotes: 2

Related Questions