user123892
user123892

Reputation: 1293

How to address multi-dimensional lists in Python with random indexes

I'm looking for an efficient way to save in a multi-dimensional list different pointers to a file. My function triggers as soon as a job is sent to it. In each job, I get three values, seq_id, cls_id and au_id. The first job initialise the pointers_list[seq_id][cls_id][au_id] as follows:

files = {}
pointers_list = []
pointers_list_flag = False

def worker(body):
    data = body['data']   
    file_id = body['file_id']
    seq_id = body['seq_id']
    cls_id = body['cls']
    au_id = body['au_id']

    if (file_id in files):
        pointers_list_flag = False
        files[file_id].append(body['du']) 
    else: 
        # first job
        files[file_id] = [body['du']]
        # do other stuffs only the first time
        [...]
        #init the pointers_list
        pointers_list.append([])
        pointers_list[seq_id].append([])
        pointers_list[seq_id][cls_id].append([])
        pointers_list[seq_id][cls_id][au_id] = 0
        pointers_list_flag = True
   if not pointers_list_flag:
       #the following jobs update the pointers_list
       current_pointer = getcurrentpointer()
       pointers_list.append([])
       pointers_list[seq_id].append([])
       pointers_list[seq_id][cls_id].append([])
       pointers_list[seq_id][cls_id][au_id] = current_pointer    

Let's suppose my first job has seq_id = 0, cls_id = 1 and au_id = 0. Obviously I get the error "index out of range" when I try to

    pointers_list[<seq_id=0>][<cls_id=1>].append([])

because I'm trying to access pointers_list[0][1], while I've only init pointers_list[0][0]. The issue is that I can't know in advance neither the length and the values of any keys. Any hints? shall I use dictionaries?

EDIT 1:

The dictionary files holds the jobs already processed. The list pointers_list instead holds the pointer (file.tell()) of the job with seq_id, cls_id and au_id to the output file. I need this because I receive the jobs with random order, but I need to write the output file with jobs data in a specified order, given indeed by seq_id, cls_id and au_id. Here the need for a list to hold the pointers. If I receive first the job with seq_id=0, cls_id=1 and au_id=0, I need my list to keep the pointer in the file where I started to write the data. When I receive a new job, let's say with seq_id=0, cls_id=0 and au_id=0, new data have to be written to the left of the current data in the output file. So, I've got to read pointers_list[0][1][0], get the point in the file where these data start, shift them by new data size and then write the new data. Finally, I need to update the pointers_list[seq_id][cls_id][au_id] of the data that has been shifted by new data.

I need the pointers_list_flag to prevent the first job to update again the pointers_list.

Upvotes: 0

Views: 95

Answers (1)

NickHilton
NickHilton

Reputation: 682

You can use collections.defaultdict for this purpose. (https://docs.python.org/2/library/collections.html#collections.defaultdict)

defaultdict allows you to specify a dict like object but whenever a key is specified that is not already in the defaultdict a default value is created.

So for this example:

files = defaultdict(list)
pointers_list = defaultdict(lambda: defaultdict(lambda: defaultdict(lambda: 0)))

def worker(body):
    data = body['data']   
    file_id = body['file_id']
    seq_id = body['seq_id']
    cls_id = body['cls']
    au_id = body['au_id']

    if (file_id in files):
        current_pointer = getcurrentpointer()
        pointers_list[seq_id][cls_id][au_id] = current_pointer   
    else: 
        # do other things here
        [...]
        # Automatically creates entry in pointers list for seq_id -> cls_id -> au_id
        pointers_list[seq_id][cls_id][au_id] = 0     

    # If file_id in files then appends to list already there 
    # otherwise creates empty list for file_id entry and appends to the empty list    
    files[file_id].append(body['du']) 

Note:

  • A defaultdict argument must be a callable so thats why you have to use the lambda: functions
  • The pointers_list_flag is actually redundant as the file_id in files statement handles everything

Upvotes: 1

Related Questions