Reputation: 1041
I have a list of excel datasets with certain information as below:
Category Subcategory Name
Main Dish Noodle Tomato Noodle
Main Dish Stir Fry Chicken Rice
Main Dish Soup Beef Goulash
Drink Wine Bordeaux
Drink Softdrink Cola
Suppose the above dataset is only one of the datasets, my desired data structure using nested dict and list is:
data = {0:{'data':0, 'Category':[
{'name':'Main Dish', 'Subcategory':[
{'name':'Noodle', 'key':0, 'data':['key':1, 'title':'Tomato Noodle']},
{'name':'Stir Fry', 'key':1, 'data':['key':2, 'title':'Chicken Rice']},
{'name':'Soup', 'key':2, 'data':['key':3, 'title':'Beef Goulash']}]},
{'name':'Drink', 'Subcategory':[
{'name':'Wine', 'key':0, 'data':['key':1, 'title':'Bordeaux']},
{'name':'Softdrink', 'key':1, 'data':['key':2, 'title':'cola'}]}]},
1:{'data':1, 'Category':.........#Same structure as dataset 0}}
So basically, the whole category is a defaultdict(list), each different categories form a dict within the whole category list. So do the different subcategories, but subcategories follow category.
I tried to use the defaultdict to do it, here are my codes:
from collections import defaultdict
data = defaultdict(dict)
cateList = ["Main Dish", "Drink"]
n = 3 # n means the number of datasets
for i in range(n):
data[i]['data'] = i
data[i]['category'] = defaultdict(list)
for j in range(len(cateList)):
data[i]['category'][j]['name'] = cateList[j]
data[i]['category'][j]['subcategory'] = defaultdict(list)
data
But I receive the following errors:
TypeError Traceback (most recent call last)
<ipython-input-81-298f7ff30c6a> in <module>()
5 data[i]['category'] = defaultdict(list)
6 for j in range(len(cateList)):
----> 7 c
8 data[i]['category'][j]['subcategory'] = defaultdict(list)
9 data
TypeError: list indices must be integers or slices, not str
This is executed in Jupyter Notebook, and it seems that it doesn't allow me to indicate the nested defaultdict in this way: data[i]['category'][j]['name'] = cateList[j]. So I am not quite sure how construct the above data structure...is there a better way?
Thank you very much for your help.
Upvotes: 1
Views: 221
Reputation: 1122352
Your spec states you wanted 'Category'
to reference a list:
data = {0:{'data':0, 'Category':[
# ^ a list opening bracket
but instead, your code makes it a dictionary:
data[i]['category'] = defaultdict(list)
but the remainder of your code then attempts to treat the 'category'
object as a list again, by using j
as an index. Because it's a dictionary instead, the expression data[i]['category'][j]
produces a list, and data[i]['category'][j]['name']
or data[i]['category'][j]['subcategory']
tries to index that list with a string.
Building this structure really doesn't require a defaultdict
; you already know you want to build data, and you are building the nested structures right there with loops. You can just use regular dictionaries and lists:
cateList = ["Main Dish", "Drink"]
n = 3 # n means the number of datasets
data = {}
for i in range(n):
data[i] = {
'data': i,
'category': []
}
category = data[i]['category']
for name in cateList:
category.append({
'name': name,
'subcategory': []
})
I'm not quite sure why you are building an outer dictionary with integer keys starting at 0. You could just make that a list too.
Upvotes: 2