Reputation: 1293
I'm looking for an efficient way to save in a multi-dimensional list different pointers to a file.
My function triggers as soon as a job is sent to it. In each job, I get three values, seq_id
, cls_id
and au_id
.
The first job initialise the pointers_list[seq_id][cls_id][au_id]
as follows:
files = {}
pointers_list = []
pointers_list_flag = False
def worker(body):
data = body['data']
file_id = body['file_id']
seq_id = body['seq_id']
cls_id = body['cls']
au_id = body['au_id']
if (file_id in files):
pointers_list_flag = False
files[file_id].append(body['du'])
else:
# first job
files[file_id] = [body['du']]
# do other stuffs only the first time
[...]
#init the pointers_list
pointers_list.append([])
pointers_list[seq_id].append([])
pointers_list[seq_id][cls_id].append([])
pointers_list[seq_id][cls_id][au_id] = 0
pointers_list_flag = True
if not pointers_list_flag:
#the following jobs update the pointers_list
current_pointer = getcurrentpointer()
pointers_list.append([])
pointers_list[seq_id].append([])
pointers_list[seq_id][cls_id].append([])
pointers_list[seq_id][cls_id][au_id] = current_pointer
Let's suppose my first job has seq_id = 0
, cls_id = 1
and au_id = 0
. Obviously I get the error "index out of range" when I try to
pointers_list[<seq_id=0>][<cls_id=1>].append([])
because I'm trying to access pointers_list[0][1]
, while I've only init pointers_list[0][0]
. The issue is that I can't know in advance neither the length and the values of any keys.
Any hints? shall I use dictionaries?
EDIT 1:
The dictionary files
holds the jobs already processed. The list pointers_list
instead holds the pointer (file.tell()) of the job with seq_id, cls_id and au_id
to the output file. I need this because I receive the jobs with random order, but I need to write the output file with jobs data in a specified order, given indeed by seq_id, cls_id and au_id
. Here the need for a list to hold the pointers. If I receive first the job with seq_id=0, cls_id=1 and au_id=0
, I need my list to keep the pointer in the file where I started to write the data. When I receive a new job, let's say with seq_id=0, cls_id=0 and au_id=0
, new data have to be written to the left of the current data in the output file. So, I've got to read pointers_list[0][1][0], get the point in the file where these data start, shift them by new data size and then write the new data. Finally, I need to update the pointers_list[seq_id][cls_id][au_id] of the data that has been shifted by new data.
I need the pointers_list_flag
to prevent the first job to update again the pointers_list
.
Upvotes: 0
Views: 95
Reputation: 682
You can use collections.defaultdict
for this purpose. (https://docs.python.org/2/library/collections.html#collections.defaultdict)
defaultdict
allows you to specify a dict
like object but whenever a key is specified that is not already in the defaultdict
a default value is created.
So for this example:
files = defaultdict(list)
pointers_list = defaultdict(lambda: defaultdict(lambda: defaultdict(lambda: 0)))
def worker(body):
data = body['data']
file_id = body['file_id']
seq_id = body['seq_id']
cls_id = body['cls']
au_id = body['au_id']
if (file_id in files):
current_pointer = getcurrentpointer()
pointers_list[seq_id][cls_id][au_id] = current_pointer
else:
# do other things here
[...]
# Automatically creates entry in pointers list for seq_id -> cls_id -> au_id
pointers_list[seq_id][cls_id][au_id] = 0
# If file_id in files then appends to list already there
# otherwise creates empty list for file_id entry and appends to the empty list
files[file_id].append(body['du'])
Note:
lambda:
functionspointers_list_flag
is actually redundant as the file_id in files
statement handles everythingUpvotes: 1