MAR
MAR

Reputation: 103

Use numpy to get count of intersecting elements in list of arrays (avoid for loop)

I have an array of values called MyFruits like:

[apple, orange, banana, apple, pear]

Then I have a list of arrays like:

[apple, orange]
[blueberry, watermelon, pear]
[grape, orange, grape, orange]
[]
[cantaloupe]

For each of the arrays in the list, I want to get the count of elements that intersect with MyFruits array divided by the total number of elements in the array. So the output would be:

2 / 2 = 1
1 / 3 = 0.66667
2 / 4 = 0.5
0 / 0 = (in this case 0)
0 / 1 = 0

essentially:

[1, 0.66667, 0.5, 0, 0]

I've been doing this in Python with for loops, but the data set is huge and it's incredibly slow. Someone suggested using numpy, but I'm having difficulty understanding.

Upvotes: 3

Views: 353

Answers (2)

Pascal Getreuer
Pascal Getreuer

Reputation: 3256

Suppose you have two list, one of length M and another of length N. If done by straightforward linear searches, it would take O(M * N) string comparisons to find which elements are in both lists.

You can improve on that using Python sets. Convert the lists to Python sets and use set intersection (&) to find their common elements. Then the complexity reduces to O(M + N).

Upvotes: 1

antont
antont

Reputation: 2756

Is this any better than what you have, or the same?

ratios = []

for d in data:
  count = 0
  for fruit in myFruits:
    count += d.count(fruit)
  ratio = count / (len(d) or 1)
  ratios.append(ratio)

Executable in replit: https://repl.it/@ToniAlatalo/CountOccurrences

I don't think numpy can help here, it's for numerical processing, but maybe there is a good way to write what you need otherwise.

Upvotes: 0

Related Questions