Speed up search of array element in second array

Question

I have a pretty simple operation involving two not so large arrays:

For every element in the first (larger) array, located in position i
Find if it exists in the second (smaller) array
If it does, find its index in the second array: j
Store a float taken from a third array (same length as first array) in the position i, in the position j of a fourth array (same length as second array)

The for block below works, but gets very slow for not so large arrays (>10000).

Can this implementation be made faster?

import numpy as np
import random

##############################################
# Generate some random data.
#'Nb' is always smaller then 'Na
Na, Nb = 50000, 40000

# List of IDs (could be any string, I use integers here for simplicity)
ids_a = random.sample(range(1, Na * 10), Na)
ids_a = [str(_) for _ in ids_a]
random.shuffle(ids_a)
# Some floats associated to these IDs
vals_in_a = np.random.uniform(0., 1., Na)

# Smaller list of repeated IDs from 'ids_a'
ids_b = random.sample(ids_a, Nb)
# Array to be filled
vals_in_b = np.zeros(Nb)
##############################################

# This block needs to be *a lot* more efficient
#
# For each string in 'ids_a'
for i, id_a in enumerate(ids_a):
    # if it exists in 'ids_b'
    if id_a in ids_b:
        # find where in 'ids_b' this element is located
        j = ids_b.index(id_a)
        # store in that position the value taken from 'ids_a'
        vals_in_b[j] = vals_in_a[i]

Paul Panzer · Accepted Answer

In defense of my approach, here is the authoritative implementation:

import itertools as it

def pp():
    la,lb = len(ids_a),len(ids_b)
    ids = np.fromiter(it.chain(ids_a,ids_b),'



That said, @juanpa.arrivillaga's suggestion can also be implemented better:

import operator as op

def ja():
    return op.itemgetter(*ids_b)(dict(zip(ids_a,vals_in_a)))

(ja()==pp()).all()
# True
timeit(ja,number=100)
# 2.015202699229121

Speed up search of array element in second array

Answers (2)

Related Questions