Reputation: 1965
Introduction
I have a dictionary with the following format:
dict_list = {'S0':[[list of int],[list of int]], 'S1':[[list of int],[list of int]], ...}
with S0's list of ints accessed via
dict_list['S0'][0] and dict_list['S0'][1]
To improve code readability, I changed the "list of list" to "dict of list" as follows:
dict_dict = {'S0': {'list0': [list of int], 'list1': [list of int]}, ...}
which results in more readable code when accessing the lists:
dict_dict['S0']['list0'] and dict_dict['S0']['list1']
Pickle Problem
However, when I pickled and saved the dict_dict to file, it turns out the penalty of having additional dict keys actually scales in proportion to the number of 'S#' entries. It seems that pickle isn't storing the dict "smartly" as it stores each and every dict key separately.
Now, we realize that this is, after all, how pickle should work, since each 'S#' could have had different sets of keys to begin with. There is no way for pickle to know beforehand that our dict_dict is actually just a table with regularly repeating fields.
The Question
My question is, is there an alternative to dict_list
wherein the list of ints can be accessed by a string key (as in dict_dict
) but without the pickle penalty described above?
UPDATE: Experiments Based on Comments Given
3,100 bytes - dict_list['S0'][0] (list.bin)
3,314 bytes - dict_dict['S0']['list0'] (dict.bin)
3,922 bytes - dict_class['S0'].list0 (class.bin)
5,855 bytes - dict_namedtuple['S0'].list0 (namedtuple.bin)
s_list = ['S0','S1','S2','S3','S4','S5','S6','S7','S8','S9','S10','S11','S12','S13','S14','S15','S0a','S1a','S2a','S3a','S4a','S5a','S6a','S7a','S8a','S9a','S10a','S11a','S12a','S13a','S14a','S15a','AA0','AA1','AA2','AA3','AA4','AA5','AA6','AA7','AA8','AA9','AA10','AA11','AA12','AA13','AA14','AA15','AA0a','AA1a','AA2a','AA3a','AA4a','AA5a','AA6a','AA7a','AA8a','AA9a','AA10a','AA11a','AA12a','AA13a','AA14a','AA15a','BB0','BB1','BB2','BB3','BB4','BB5','BB6','BB7','BB8','BB9','BB10','BB11','BB12','BB13','BB14','BB15','BB0a','BB1a','BB2a','BB3a','BB4a','BB5a','BB6a','BB7a','BB8a','BB9a','BB10a','BB11a','BB12a','BB13a','BB14a','BB15a']
num_of_s_entries = 32
list_length = 5
def pickle_n_save(dict_var, filename):
outfile = open(filename, "wb")
pickle.dump(dict_var, outfile)
outfile.close()
# ------------------------------------------------------------dict_list['S0'][0]
dict_list = {}
for s in s_list[0:num_of_s_entries]:
dict_list[s] = [[],[]]
for pts in range(0,list_length):
dict_list[s][0].append(randrange(1,100))
dict_list[s][1].append(randrange(1,100)*1000)
pickle_n_save(dict_list, "list.bin")
# -----------------------------------------------------dict_dict['S0']['list0']
dict_dict = {}
for s in dict_list.keys():
dict_dict[s] = {}
dict_dict[s]['list0'] = dict_list[s][0]
dict_dict[s]['list1'] = dict_list[s][1]
pickle_n_save(dict_dict, "dict.bin")
# -------------------------------------------------------dict_class['S0'].list0
class S:
def __init__(self, list0, list1):
self.list0 = list0
self.list1 = list1
dict_class = {}
for s in dict_list.keys():
dict_class[s] = S(dict_list[s][0],dict_list[s][1])
pickle_n_save(dict_class, "class.bin")
# ---------------------------------------------------dict_namedtuple['S0'].list0
S_namedtuple = namedtuple('S_namedtuple', ['list0','list1'])
dict_namedtuple = {}
for s in dict_list.keys():
dict_namedtuple[s] = S_namedtuple(dict_list[s][0],dict_list[s][1])
pickle_n_save(dict_namedtuple, "namedtuple.bin")
Upvotes: 1
Views: 157