Use numpy to get count of intersecting elements in list of arrays (avoid for loop)

Question

I have an array of values called MyFruits like:

[apple, orange, banana, apple, pear]

Then I have a list of arrays like:

[apple, orange]
[blueberry, watermelon, pear]
[grape, orange, grape, orange]
[]
[cantaloupe]

For each of the arrays in the list, I want to get the count of elements that intersect with MyFruits array divided by the total number of elements in the array. So the output would be:

2 / 2 = 1
1 / 3 = 0.66667
2 / 4 = 0.5
0 / 0 = (in this case 0)
0 / 1 = 0

essentially:

[1, 0.66667, 0.5, 0, 0]

I've been doing this in Python with for loops, but the data set is huge and it's incredibly slow. Someone suggested using numpy, but I'm having difficulty understanding.

Pascal Getreuer · Accepted Answer

Suppose you have two list, one of length M and another of length N. If done by straightforward linear searches, it would take O(M * N) string comparisons to find which elements are in both lists.

You can improve on that using Python sets. Convert the lists to Python sets and use set intersection (&) to find their common elements. Then the complexity reduces to O(M + N).

Use numpy to get count of intersecting elements in list of arrays (avoid for loop)

Answers (2)

Related Questions