How to vectorize a nested for loop in python

Question

I am mostly concerned with efficiency. I have a very long list of IDs, and I have a second, shorter list of IDs. I want to store the positions of the IDs in the second list that correspond to each ID in the first list (each ID should only appear once in each list). I have written a nested for loop to do this, but since the first list contains over 1000 elements and the second list contains over 80k elements, so the below code takes very long (but works).

IDD1 = [0] * leng
IDD2 = [0] * leng
## Match IDs to position in table
for i in range(leng):
    for j in range(len(halo_id)):
        if ID1[i] == halo_id[j]:
            IDD1[i] = j
        if ID2[i] == halo_id[j]:
            IDD2[i] = j

If it's of any relevance, the IDs originally come from a halotools halo catalog table.

Edit:

The data is literally just a list of integers in both cases. The result I want is a list of integers (indices). ID1 and ID2 are essentially the same thing I just need to operate on both of them the same way. They are a list of integers I have from earlier. halo_id is the same but much longer.

Ignacio Vazquez-Abrams · Accepted Answer

First, create a mapping of ID to position:

idmap = {i: e for (e, i) in enumerate(halod_id)}

Then iterate over the smaller list and put it through the mapping:

idd1 = [idmap[el] for el in id1]

This reduces the operation from O(n*m) to O(n+m).

How to vectorize a nested for loop in python

Answers (1)

Related Questions