Reputation: 1440
I have a csv where each row is a dictionary, inside each row's dict, there is a list, and this list contains a sublist and a subdict. Each sublist has 2 elements, and the subdict has 100 keys, and a value per key. This is a screenshot of the data:
Here's a sample of the data in text format:
{"0": [[10.8, 36.0], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"1": [[10.8, 36.1], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"2": [[10.8, 36.2], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"3": [[10.8, 36.300000000000004], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"4": [[10.8, 36.4], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"5": [[10.8, 36.5], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"6": [[10.8, 36.6], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"7": [[10.8, 36.7], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"8": [[10.8, 36.800000000000004], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0}]}
{"9": [[10.8, 36.9], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0}]}
What i would like to do is to read this into a pandas dataframe that would produce an expected output like this (i will just type a single row for the sake of simplicity):
list_elemnt_1 list_elemnt_2 key_0, key_1, key_2, key_3, key_4, and so on...
value value value value value value value and so on...
For each row in the csv, i would like to build a dataframe with one column per sublist value (2), and one column for each key in the subdict contained on the row's dict.
How could i do this?? Please feel free to ask more information if needed.
Thank you very much in advance
EDIT
Key_0, key_1, key_2, etc... are the subdict keys, not the master dict keys
Upvotes: 0
Views: 1114
Reputation: 4055
Not the best way to do it.
# Edit for reading the csv
# there are two ways to go about it, I am assuming data is in 1 column
df_csv = pd.read_csv('/path/to/your/file/filename.csv')
# read in the csv, I assume you are able to do this.
list_of_dfs = []
for idx, row in df_csv.iterrows():
d = row[column_name] # find the column name and insert here
df = pd.DataFrame.from_dict(d,orient='index') # creating a dataframe to get the number of lines
remove_cols = df.columns
for i in d.keys():
df['list_elemnt_1'] = d[i][0][0]
df['list_elemnt_2'] = d[i][0][1]
for key in d[i][1].keys():
df[key] = d[i][1][key]
# remove the original cols here
list_of_dfs.append(df)
This will give you the df
s of each line as a line elemnt in the list_of_dfs
which I assumed is the goal?
Let me know if it works.
Upvotes: 1
Reputation: 1725
import ast
import pandas as pd
file = open('file_55966371.csv', 'r')
lines = [ast.literal_eval(line) for line in file]
def clean_lines(line):
value = [v for v in line.values()]
l1, l2 = value[0][0]
line_dict = value[0][1]
line_dict = {f'key_{key}': value for key, value in line_dict.items()}
line_dict['list_element1'] = l1
line_dict['list_element2'] = l2
return line_dict
to_read = [clean_lines(line) for line in lines]
df = pd.DataFrame(to_read)
I agree with @furas this looks a lot like a JSON, and if this data was sourced from someone, it would be best if you could ask them if they could send it to you in JSON format.
If not, the code above works.
Open the file.
read each line and store it as a list. ast.literal_eval
allows Python to recognize that it's a dictionary from the get-go and stores them as dict
objects.
i created a helper clean_lines
function. clean_lines is the more important part.
l1
and l2
key_X
)
d. add l1 and l2 as entries into the dictionary, basically combining the sublist and subdict into a single dictionaryOnce you have a list of dictionaries, pandas will be able to recognize it and you can plug it into a pd.DataFrame
instantiator object
Upvotes: 2